Marine Vessel Classification and Multivariate Trajectories Forecasting Using Metaheuristics-Optimized eXtreme Gradient Boosting and Recurrent Neural Networks

Petrovic, Aleksandar; Damaševičius, Robertas; Jovanovic, Luka; Toskovic, Ana; Simic, Vladimir; Bacanin, Nebojsa; Zivkovic, Miodrag; Spalević, Petar

doi:10.3390/app13169181

Open AccessArticle

Marine Vessel Classification and Multivariate Trajectories Forecasting Using Metaheuristics-Optimized eXtreme Gradient Boosting and Recurrent Neural Networks

by

Aleksandar Petrovic

^1,†

,

Robertas Damaševičius

^2,*,†

,

Luka Jovanovic

^1,†

,

Ana Toskovic

^3,†

,

Vladimir Simic

^4,5,†

,

Nebojsa Bacanin

^1,6,†

,

Miodrag Zivkovic

^1,†

and

Petar Spalević

^7,†

¹

Faculty of Informatics and Computing, Singidunum University, 11010 Belgrade, Serbia

²

Department of Applied Informatics, Vytautas Magnus University, 44404 Kaunas, Lithuania

³

Teacher Education Faculty, University of Pristina in Kosovska Mitrovica, 38220 Kosovska Mitrovica, Serbia

⁴

Faculty of Transport and Traffic Engineering, University of Belgrade, Vojvode Stepe 305, 11010 Belgrade, Serbia

⁵

Department of Industrial Engineering and Management, College of Engineering, Yuan Ze Univerzity, Yuandong Road, Zhongli District, Taoyuan City 320315, Taiwan

⁶

MEU Research Unit, Middle East University, Amman 11831, Jordan

⁷

Faculty of Technical Science, University of Pristina in Kosovska Mitrovica, 38220 Kosovska Mitrovica, Serbia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2023, 13(16), 9181; https://doi.org/10.3390/app13169181

Submission received: 8 July 2023 / Revised: 28 July 2023 / Accepted: 7 August 2023 / Published: 11 August 2023

(This article belongs to the Special Issue Intelligent Systems Applied to Maritime Environment Monitoring)

Download

Browse Figures

Versions Notes

Abstract

:

Maritime vessels provide a wealth of data concerning location, trajectories, and speed. However, while these data are meticulously monitored and logged to maintain course, they can also provide a wealth of meta information. This work explored the potential of data-driven techniques and applied artificial intelligence (AI) to tackle two challenges. First, vessel classification was explored through the use of extreme gradient boosting (XGboost). Second, vessel trajectory time series forecasting was tackled through the use of long-short-term memory (LSTM) networks. Finally, due to the strong dependence of AI model performance on proper hyperparameter selection, a boosted version of the well-known particle swarm optimization (PSO) algorithm was introduced specifically for tuning the hyperparameters of the models used in this study. The introduced methodology was applied to real-world automatic identification system (AIS) data for both marine vessel classification and trajectory forecasting. The performance of the introduced Boosted PSO (BPSO) was compared to contemporary optimizers and showed promising outcomes. The XGBoost model tuned using boosted PSO attained an overall accuracy of 99.72% for the vessel classification problem, while the LSTM model attained a mean square error (MSE) of 0.000098 for the marine trajectory prediction challenge. A rigid statistical analysis of the classification model was performed to validate outcomes, and explainable AI principles were applied to the determined best-performing models, to gain a better understanding of the feature impacts on model decisions.

Keywords:

particle swarm optimization; metaheuristic optimization; marine vessel trajectory; time-series; extreme gradient boosting; long-short term memory; explainable AI

1. Introduction

Maritime vessels, as a critical component of global trade and transportation, generate a wealth of data concerning their location, trajectories, and speed. These data, meticulously monitored and logged to maintain course, can also provide a wealth of meta information that can be harnessed for various purposes. Historically, the importance of marine vessel classification and data analysis can be traced back to early geographical and historical contexts. Understanding the geographical location, topographical features, and historical exploration of maritime routes provides the foundation for the modern system of maritime transportation [1].

In recent years, advancements in data analysis techniques have enabled the extraction of more value from the data generated by maritime vessels. For instance, the application of density-based spatial clustering of applications with noise (DBSCAN) has been used to model vessel behaviors based on trajectory point data. This approach allows for the detection of vessel trajectory anomalies, such as unexpected stops, deviations from regulated routes, or inconsistent speed, thereby contributing to situational awareness, vessel collision prevention, safe navigation, and route planning. Moreover, the concepts of ‘big trajectories’ and ‘semantic trajectories’ have emerged, referring to enriched sets of trajectory data that combine multiple semantic aspects with pure spatio-temporal facets [2]. This approach allows for a more comprehensive analysis of trajectory data, unveiling patterns in the life of humans, objects, and animals, and informing policy-making in various application domains in transportation [3], security, health, tourism, and the environment. Specifically, in the context of fishing vessels, real-time anomaly detection models based on edge computing have been developed [4,5]. These models make full use of the information of moving edge nodes and nearby nodes and combine a historical trajectory extraction detection model with an online anomaly detection model to detect anomalies. This approach has proven effective for real-time anomaly detection of fishing vessels, providing valuable information about promising fishing locations and weather anomalies.

Activities related to naval travel are difficult to track due to their nature and the lack of sensors that can record data. In most cases, the main monitoring system for a vessel is its GPS. AIS is used to track the trajectories of vessels and offer more precise information on the sailed route [6,7,8]. The observed data offer insight into anomalies in the planned route, such as diversions from the course and inexplicable obstructions. These are usually signs of illegal activity, such as poaching and dark fishing. The second phenomenon can be explained as fishing crews wanting to perform illegal activities by turning off their transponders linked to the AIS. The use of unregistered gear and more vessels than permitted fishing in certain areas both damage wildlife.One example is the use of two vessels in pair trawling. All of the mentioned behaviors are harmful to the environment and cause disturbances in marine life. To solve these problems, vessel track prediction can be used [9,10,11].

The purpose of this research was to propose a novel XGBoost-based solution using a PSO optimized LSTM network for vessel classification and trajectory forecasting. Metaheuristics, especially the swarm subfamily, have proven excellent optimizers for non-deterministic polynomial-time hardness (NP-hard) problems, to which LSTM optimization is considered to belong. The method was tested against state-of-the-art hybrid solutions. Classification was performed as the first experiment for vessel identification, while regression was carried out during the second experiment for prediction of the vessel’s trajectory. The authors propose a robust framework for vessel classification and trajectory multivariate forecasting, which exploits XGBoost and LSTM tuning based on the BPSO metaheuristic.

The primary contributions of this work can be outlined as the follows:

The introduction of a robust framework for tackling pressing and important issues in maritime vessel identification and tracking;
The proposal of a robust XGBoost-based mechanism for marine vessel identification and classification;
The proposal of a time series-based approach that leverages LSTM networks for vessel trajectory forecasting;
The introduction of a boosted PSO specifically for addressing the hyperparameter tuning needs of XGBoost and LSTM;
The interpretation of the best-performing classification model using explainable AI techniques, to better understand feature importance when tackling this pressing issue.

The structure of this paper is as follows: Section 2 provides the necessary introduction to the technologies applied in this work, Section 3 gives details about the original PSO algorithm and the modifications performed towards the construction of the final model, Section 4 provides the experimental foundation of the research, the results are provided in Section 5, and Section 6 concludes the paper.

2. Background

In the following Section, the fundamentals of this research area are provided. First, the XGBoost method is described, along with the equations that describe its behavior. The basics of a recurrent neural network (RNN) are provided, followed by the foundation of LSTM. Furthermore, parameters for tuning an LSTM are provided. Next, the basics of metaheuristic methods are shown, followed by a brief background on Shapley additive explanations (SHAP).

2.1. Extreme Gradient Boosting Algorithm

XGBoost is recognized as an ML method with distinguished performance [12,13]. The parameter settings for the XGBoost model require tuning, as they have a key impact on the performance of the model. The philosophy behind this technology requires a combination of multiple weaker models to produce one that is capable of accurate prediction. The performance improvements are noticeable, due to the utilization of techniques such as the mentioned optimization, as well as regularization and gradient boosting. The predictions are based on the patterns for which the model has been trained, to understand complex input and target dependencies.

The number of XGBoost parameters is not small, and the optimal combination for the model configuration is not realistically achievable using a trial-and-error method. Considering that certain problems are more complex, the process of optimization requires a robust solution [14]. Three aspects of the model should be considered as the goal of optimization: speed, generalization, and accuracy.

The best results are achieved through an iterative tuning process [13]. The objective function of the XGBoost solution is given in Equation (1).

obj (Θ) = L (θ) + Ω (Θ),

(1)

where the combined loss function and regularization sum are shown.

Θ

represents the XGBoost hyperparameter set,

L (Θ)

is the loss function, while the regularization term is given as

Ω (Θ)

. The complexity of the model is managed through

Ω (Θ)

. The loss function is provided by the MSE.

L (Θ) = \sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2},

(2)

where the predicted value is given as

y_{i}

, and the predicted target variable value is

{\hat{y}}_{i}

for each iteration i.

L (Θ) = \sum_{i} [y_{i} ln (1 + e^{- {\hat{y}}_{i}}) + (1 - y_{i}) ln (1 + e^{{\hat{y}}_{i}})] .

(3)

The goal of the function is to differentiate between actual and predicted values. With the minimization of the overall loss function, the value classification is improved.

2.2. Long-Short Term Memory Model

The purpose for which RNNs were designed [15] is time series forecasting. A normal multilayer perceptron can be compared to the structure of this network, except for the hidden unit connections being enabled after a certain delay. Consequentially, the model can detect correlations between temporal occurrences in the data that are not next to each other and can be far away.

Despite the capabilities of the first versions of RNNs, they are not without deficiencies. The training of such models is hindered due to the exploding and vanishing gradient issues. These problems can be mitigated by incorporating newer RNN versions, of which an LSTM can be considered an example. However, these models can be complicated, as they require a large number of hyperparameters. It should be noted that a basic RNN can outperform newer recurrent models such as the LSTM and gated recurrent units (GRU), due to the smaller number of parameters [16].

On the other hand, an advantage of RNNs is that they do not require input vectors of fixed length for which the output is also fixed. Sequences and other rich structures can be used to take advantage of this characteristic.Put simply, a RNN can, not only work with sequences of input vectors, but also generate sequences for the output. With the hidden state being retained, the RNN can process the data of the sequence.

Researchers have applied the LSTM to problems similar to the topic explored in this work. Storage is available for information within the network, which enables the usage of previously learned information. Time series problems require that a model uses long-term dependency. This is achieved by replacing the memory cells of the hidden layers. Data are manipulated using three gates of the LSTM, which are the input, output, and forget gates.

The data passes through the forget gate

f_{t}

, which decides if the data will be forgotten based on Equation (4).

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f}),

(4)

for which the range of the gate

f_{c}

is

[0, 1]

, the sigmoid function is displayed as

σ

, while the variable matrices are shown as

W_{f}

and

U_{f}

, and

b_{f}

is the bias vector.

The second gate is the input gate, which decides which data will be saved, as shown in Equation (5).

i_{t} = σ (W_{i} X_{t} + U_{i} h_{t - 1} + b_{i}),

(5)

where

[0, 1]

is the range of

i_{t}

, and the learnable parameters are displayed as

b_{i}

W_{i}

, and

U_{i}

, while the potential update vectors

C_{t}

are calculated using Equation (6).

C_{t} = t a n h (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c}),

(6)

for which the parameters for learning are given as

b_{c}

,

W_{c}

U_{c}

. Equation (7) describes the process that follows.

C_{t} = F_{t} ⊙ C_{t - 1} + i_{t} ⊙ C_{t},

(7)

where ⊙ is signification and element-wise multiplication.Data that are to be disposed of are given as

C_{t - 1}

.

f_{t} ⊙ C_{t}

provides the data to be stored, while the new data

i_{t} ⊙ C_{t}

will be stored by

C_{t}

.

The hidden state

h_{t}

value

o_{t}

output gate is calculated using the sigma function as given in Equation (8)

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o}),

(8)

h_{t} = o_{t} ⊙ t a n h (C_{t}),

(9)

where

o_{t}

is defined in the range

[0, 1]

with

b_{o}

,

W_{o}

, and

U_{o}

learnable parameters. The

h_{t}

output value is represented as the

o_{t}

and

t a n h

product value of

C_{t}

in Equation (9).

Hyperparameters of LSTM

Optimal value selection of hyperparameters requires experimentation, as well as optimization, for which, in this case, the PSO is used. The performance of the model is directly influenced by the hyperparameters. The following text provides a list of the hyperparameters usually optimized when configuring LSTM models indicated in bold:

1. The number of hidden layers ( $n_{h i d}$ ): The hidden layers determine the model’s depth. With an increase in the depth of the model, the computational complexity rises, as well as the complexity of the capturable patterns, dependencies, and, unfortunately, the chance of overfitting;

2. The number of hidden units per layer ( $n_{u n i t}$ ): Similarly to the previously described hyperparameter, the number of hidden neurons (units) increases the solvable pattern complexity and the chance of overfitting;

3. Type of RNN cell: Two types have been applied to date, the LSTM and GRU, with the purpose of long-range dependency capabilities and mitigation of the vanishing gradient issue.

4. Learning rate ( $α$ ): A crucial hyperparameter that controls the update size of the weight of the model for the duration of training. Lower values result in a higher precision but slower convergence, while higher values risk missing the optimal solution as the process becomes faster.

5. Dropout rate ( $p_{d r o p}$ ): Regularization technique for overfitting prevention in neural networks. This technique is performed in the training phase, and random neurons are deactivated according to the dropout rate.

6. Batch size: The model weights’ number of single update training samples.An increase in the batch size results in a higher gradient accuracy and faster training, at the cost of computational resources.

7. Sequence length: Input and output sequence length. Larger sequences result in more distant dependencies being capturable, but this also increases the overfitting risk and computational complexity.

2.3. Metaheuristic Methods and Related Works

The popularity of optimization of machine learning models has significantly increased recently. Various reasons have influence this trend, and the most important two are the complexity of models, which only increases with new achievements, and the second is the growing number of hyperparameters. This process was performed through trial and error, but due to the stated reasons, this is no longer feasible. Considering that parameter selection requires optimal values for both discreet and continuous values, this is considered a mixed NP-hard problem.

The main goal is to solve NP-hard problems within reasonable time limits, while maintaining realistic computational requirements. If the parameter selection process is considered an optimization task, the metaheuristic group of algorithms can be applied to provide significant performance increases. Swarm metaheuristics are a sub-group of such solutions based on cooperative behaviors observed from nature and that have been distinguished by their high performance for the mentioned task.

Solutions from this group that are often applied for optimization by researchers include the PSO [17], firefly algorithm (FA) [18], genetic algorithm (GA) [19], Harris hawk optimizer (HHO) [20], artificial bee colony (ABC) [21] algorithm, bat algorithm (BA) [22], and whale optimization algorithm (WOA) [23]. Additionally, the recently introduced chimp optimization algorithms (ChOA) [24] have also been included in this work.

An even greater variety of metaheuristic real-world use cases have been recorded in recent years and some are listed in the following text: wireless sensor and IoT optimization [25,26,27,28]; medical image processing and classification [29,30]; credit card fraud identification [31,32]; COVID-19 case prediction [33,34]; fog, cloud, and cloud-edge computing system organization [35,36,37,38]; intrusion detection for network and computer systems [39,40,41]; feature selection [42,43]; the prediction of energy production and consumption [44,45,46]; the tuning of different ML structures [47,48,49,50,51]; and, lastly, tracking and predicting air pollution and environmental monitoring [52,53,54].

For the problem of trajectory prediction, many different optimization techniques are applied, and the review presented includes those with and without the application of machine learning techniques, as well as works that included trajectory prediction in general.

The research presented in [55] proposed ant-lion optimization of flight trajectory-based LSTM predictions. The results indicated improvements in terms of the convergence speed and accuracy. The authors in [56] applied a PSO-based algorithm to optimize a support vector machine for regression (SVR), used for urban environment vehicle trajectory prediction based on a global positioning system (GPS) and onboard diagnostics (OBD) integration (GOI). SVR parameter selection was the subject of optimization using the swarm metaheuristic. The paper in [57] employed AIS sensor data for SVR model training with adaptive chaos differential evolution. AIS data included the ship’s speed, course, time stamp, longitude, and latitude. Optimization was performed for parameter optimization, with the goal of convergence and prediction improvement. A system for early warning in maritime navigation was proposed by Suo et al. [58] with the employment of AIS data, alongside a GRU prediction-based model optimized using a density-based spatial clustering of applications with noise (DBSCAN) algorithm. Vessel trajectories were the prediction object, which was compared to the performance of LSTM on the same problem. The accuracy was near the accuracy of the LSTM, but improvements were noted for the computational cost.

In terms of metaheuristic optimization, research topics are various, and the selected examples represent different types of trajectory prediction. In the case of Gian et al. [9], inland water was the predicted space, instead of the open water examples explored in most works. Furthermore, the paper did so by employing a GA-based LSTM solution. The work also applied AIS data. The next problem optimized using GA after river-path prediction was the flight optimization of unmanned aerial vehicles (UAV) by Cacchiani et al. [59]. Their goal was to provide easy use of an UAV for most users, by adjusting its speed based on the communication range of the users. The final solution included the biased random-key genetic algorithm (BRKGA) with a simulated annealing algorithm (SAA). Rapid-low thrust trajectory predictions in deep space were performed using the proposed LSTM GA-optimized solution [60], and employing AIS data. A different algorithm was explored by Wood et al. [61], which was a hybrid bat flight optimization algorithm (HBFO), for the problem of well-bore trajectory prediction.

2.4. Shapley Additive Explanations

To provide a clear representation of the model’s performance, this research utilized the SHAP method. This approach is valued for its ability to offer understandable and direct explanations of a model’s decisions, while avoiding the typical trade-off between accuracy and interpretability [62,63,64,65]. To calculate the feature importance, a game-theory approach based on Shapley values [66] was employed, to enhance individual predictions.

By determining the discrepancy between predictions and their averages [67], each participating party (feature) receives a collective reward based on their contribution.The distribution of payments in such way is referred to as Shapley values. When a feature is assigned a baseline value (mean), the SHAP assesses its impact concerning the model’s prediction, attributing each feature an importance measure based on its individual contribution to a particular prediction. This approach yields valuable insights, minimizes the risk of underestimating a feature’s importance, captures Shapley value generalization-based interaction effects, and enables the interpretation of a model’s global behavior, while retaining local accuracy [68,69,70].

3. Introduced Modified Metaheuristics

This section first provides a brief overview of baseline PSO metaheuristics. Afterward, the motivations for its improvement are given, along with the modifications introduced and inner-working details of the devised modified PSO approach.

3.1. PSO

Developed in 1995, the PSO algorithm was introduced by Kennedy and Eberhart [17], for which the main inspiration was the flocking of fish and birds. Search agents are defined as particles considered to be individuals in the population. The algorithm is capable of providing satisfactory results for discrete and continuous optimization problems. The best experience of the individuals and those that are in their neighborhood is considered a form of collective intelligence.

The process begins with velocities being applied at random to every particle, which is a form of initial position. During the iterations, the particles change their location to the best one retained at the end of each iteration. Particles move at a given velocity, which can be described by three weight components: the old velocity, the best solution indicated velocity so far, and the neighbor-obtained direction of the best solution so far.

\{\begin{matrix} \vec{v_{i}} \leftarrow \vec{v_{i}} + \vec{U} (0, ϕ_{1}) ⨂ (\vec{p_{i}} - \vec{x_{i}}) + \vec{U} (0, ϕ_{2}) ⨂ (\vec{p_{g}} - \vec{x_{i}}) \\ \vec{x_{i}} \leftarrow \vec{x_{i}} + \vec{v_{i}}, \end{matrix}

(10)

where the component-wise multiplication is given as ⨂, all components of the

v_{i}

, are defined by the range of

[- v_{m a x}, + v_{m a x}]

, and for every iteration, each particle randomly generated in the vector

\vec{U} (0, ϕ_{1})

of uniformly distributed numbers in the range 0 to

ϕ_{i}

. Finally,

\vec{p_{g}}

refers to the global best position.

Each particle can be a solution in a D-dimensional space, for which the position is defined as

x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i D}),

(11)

The best position before the next update is given as

p_{i} = (p_{i 1}, p_{i 2}, \dots, p_{i D}),

(12)

Finally, the velocities are represented as

v_{i} = (v_{i 1}, v_{i 2}, \dots, v_{i D}),

(13)

p b e s t

and

g b e s t

represent the best solution so far for the particle and the best solution in the group, respectively. Considering both pieces of information, the particle needs to decide its next move depending on the distance from its current position to

p b e s t

and

g b e s t

.

Applying an inertia weight approach, this behavior can be modeled as:

v_{i d} = w \cdot v_{i d} + c_{1} \cdot r_{1} \cdot (p_{i d} - x_{i d}) + c_{2} \cdot r_{2} \cdot (p_{g d} - x_{i d}),

(14)

for which

v_{i d}

is the particle velocity;

x_{i d}

is the particle’s current position; the inertia factor is w;

c_{1}

and

c_{2}

are parameters for the relative influence of definition on the cognitive and social component, respectively;

r_{1}

and

r_{2}

are random numbers; and

p b e s t

and

g b e s t

are shown as

p_{i d}

and

p_{g d}

, respectively.

The inertia factor is applied as follows:

w = w_{m a x} - \frac{w_{m a x} - w_{m i n}}{T} \cdot t,

(15)

where the initial weight is

w_{m a x}

, final weight is

w_{m i n}

, maximum iteration number is T, and the current iteration is denoted as t.

3.2. Modified PSO Approach

The elementary variant of PSO is a robust tuning algorithm. Nevertheless, this exhibits some deficiencies, as was noted during extensive simulations using well-known CEC (Congress on Evolutionary Computation) benchmark functions [71]. First of all, the exploration power, as well as the convergence speed, of the PSO could be improved. Another noted issue is that, in some executions of the algorithm, it can become stuck in the areas where local optima reside. If this happens, the algorithm will converge prematurely in a suboptimal domain, missing the region where the best results are found. Aiming to tackle these flaws of the basic variant of PSO, this research introduces two modifications, which are described in the following subsections.

3.2.1. Chaotic Elite Learning

The first alteration introduced is adding the chaotic elite learning technique, a strategy that can help the algorithm avoid premature convergence to incorrect regions of the search space, therefore making the algorithm more efficient and capable of discovering optimal solutions [72]. Improvement of the individual is achieved by applying a chaotic sequence to produce new solutions in its proximity. This approach helps preserve the diversity of the population and also allows an easier escape from the suboptimal regions. The proposed modification revolves around using a logistic map, given by Equation (16):

c_{i + 1} = 4 \times c_{i} \times (1 - c_{i}),

(16)

here, i marks the number of iterations,

c_{i}

represents the chaotic value belonging to the i-th round, while the initial value

c_{0}

is generated randomly in the interval

[0, 1]

.

This procedure is utilized to update the best solution

P_{i}

, as follows:

P_{i, j}^{'} = P_{i, j} + r a n d \times (2 \times c_{i} - 1),

(17)

here,

P_{i, j}^{'}

corresponds to the j-th component of the best position that is being updated.

3.2.2. Lévy Flight

To enhance the updating procedure, instead of using random walk, which is not necessarily the most productive solution, the Lévy flight mechanism is considered in this paper [73,74]. It has proven to be very efficient [75,76], and it is used to improve the global search procedure. Each solution is provided the option to utilize an occasional long flight distance, which allows jumping out of the suboptimal region with respect to the overall performance of the solution. This distance is gradually decreased with time, as the algorithm converges and narrows down to the promising domain, and it is not necessary to jump larger distances at this point. This approach is explained in Equations (18) and (19)

X_{i, j}^{'} = X_{i, j}^{b e s t} \times L \times e,

(18)

e = \frac{\vec{a}}{2},

(19)

here,

X_{i, j}^{'}

represents the j-th component of the i-th solution that is being updated, while

X_{i, j}^{b e s t}

represents the optimal individual guidance of the i-th solution. Parameter e is the scaling factor, decreasing over the rounds, and

\vec{a}

is obtained through

\vec{a} = 1 + c o s (π \times \frac{t}{T}),

(20)

here, t represents the current round, while T denotes the maximum number of rounds. Finally, L is the Lévy flight distribution, calculated as

L = s \times \frac{u \times ϕ}{{| ν |}^{\frac{1}{τ}}},

(21)

where

τ

represents the Lévy index; s is a constant with a value

0.01

, which has the role of preventing too long jumps; and the parameters u and

ν

are arbitrary values in the range

[0, 1]

. Finally, parameter

ϕ

can be obtained using the following equation:

ϕ = {(\frac{Γ (1 + τ) \times s i n (\frac{π \times τ}{2})}{Γ (\frac{1 + τ}{2}) \times τ \times 2^{\frac{τ - 1}{2}}})}^{\frac{1}{τ}},

(22)

where

Γ

represents the gamma function, obtained using

Γ (1 + τ) = \int_{0}^{\infty} x^{τ} e^{- x} d x,

(23)

3.3. Modified PSO Pseudo-Code

The modified algorithm has been named BPSO, to reflect the fact that the elementary version lacks exploitation, which is addressed with the incorporated alterations. The first modification, the chaotic elite learning, is applied to the current best solution in every iteration. However, the second modification, Lévy flight, is incorporated as the standard search mechanism of the proposed BPSO approach. The process of updating the solution’s positions is either executed using the standard PSO search (Equation (10)) or the Lévy flight search procedure (Equation (18)). This behavior is controlled by the pseudo-random number

ϕ

, which is uniformly distributed between 0 and 1 and generated for each solution at every iteration. In this way, a better balance between exploitation and exploration is established, according to practical observations. Finally, by taking all this into account, a simplified pseudo-code of the BPSO, where the maximum number of iterations T is taken as the termination condition, is given in Algorithm 1.

Algorithm 1 Pseudo-code of the BPSO metaheuristics

Initialize values of static and dynamic control parameters
Initialize particle population array using random positions and velocities on D dimensions in the search space
repeat
Evaluate the desired optimization fitness function of each particle in D variables
Compare the outcome with $p b e s t_{i}$
if Current value is better than the $p b e s t_{i}$ then
Set $p b e s t_{i}$ to the value of the current best
Set $\vec{p_{i}}$ to the current location $\vec{x_{i}}$ in D-dimensional space
end if
Assign the index of the best solution so far to the variable g
if $ϕ$ <= 0.5 then
Adjust the particle’s position according to the Equation (10)
else
Adjust the particle’s position according to the Equation (18)
end if
Update the best solution’s position by applying chaotic update given by (Equation (17))
Update dynamic control parameters’ values
until The criterion is met

The proposed BPSO does not entail additional complexity compared to the basic PSO in terms of fitness function evaluations (FFEs). Therefore, the complexity of basic PSO and BPSO can be given as

O (N) = N + N \cdot T

, where N represents the number of solutions in the population.

4. Experimental Environment and Preliminaries

To validate the performance of the proposed approach, as well as demonstrate the optimization potential of the introduced modified metaheuristic, two sets of experiments were conducted. The first set of experiments leveraged XGBoost to classify vessel types based on the available AIS data. Two objective functions were utilized for classification. The first utilized Cohen’s kappa metric, while the second relied on the classification error rate. The second experimental set relied on AIS positional data to predict the vessel trajectory.

4.1. Datasets and Preprocessing

The dataset used for the classification experiments is publicly available and was acquired from Kaggle (https://www.kaggle.com/datasets/eminserkanerdonmez/ais-dataset, accessed on 28 June 2023.). The data consist of 10 relevant features available in AIS data, including vessel Maritime Mobile Service Identities (MMSIs) numbers, navigational status, speed over ground, course over ground, vessel heading, width, length, and draft. These features were leveraged to determine the type of maritime vessel. Furthermore, the navigation status feature, given as a string in the original dataset, needed further processing. Originally, this string was encoded as an integer, due to the ability of the decision tree model to handle categorical features encoded as integers. However, compared to one-hot endorsing, this approach yielded less favorable outcomes. Therefore, the ship navigation status was encoded using one-hot encoding. The vessel class distributions in the classification dataset can be seen in Figure 1.

The dataset used for the second experiment of forecasting the vessel trajectory was acquired from the public U.S. marine cadastre (https://marinecadastre.gov/ais/, accessed on 28 June 2023.). These data on the location and properties of ships in both U.S. and international waters were gathered by the U.S. Coast Guard using a navigation safety device installed onboard, which transmits and tracks this information. The AIS data for 31.03.2022. were used for this experiment. However, due to the large number of vessels available in the dataset, a single vessel was selected for experimentation. The selected vessels were chosen due to their possession of the largest amount of available data. Nevertheless, the trajectory data needed further reprocessing before being used for training and testing.

To prepare the trajectory dataset for further experimentation, features irrelevant for the trajectory needed to be removed. These included the ship’s MMSI, vessel type, transceiver class, navigational status, length, and height. Thus, the remaining features included the latitude and longitude, speed over ground, course over ground, and heading. Due to the inconsistent nature of the transmissions, the data were resampled into 60-second intervals, and the missing values were interpolated using polynomial interpolation techniques. Finally, two additional values were derived from the vessel heading and speed over the ground. These represented the vessel velocities in the X and Y directions. The introduction of these features helped the model account for the complex relationship between the heading and trajectory and improved the prediction accuracy.

4.2. Experimental Setup

Experiments included comparing the performance of several well-established optimization algorithms to the introduced metaheuristic. Each algorithm was tasked with selecting a set of parameters yielding the best-performing model. Alongside the introduced metaheuristics, the original PSO [17] and GA [19] were also evaluated. Additionally, well-known optimizers were also subjected to a comparative analysis, including the ABC [21], BA [22], WOA [23], HHO [20], and the relatively newly introduced ChOA [24].

For the classification experiment, metaheuristics were allocated a population size of 10, allowed 10 iterations, and tasked with optimizing XGBoost hyperparameters and improving outcomes. Parameters were chosen for optimization due to their high influence on model performance. The parameters and their respective ranges included: learning rate

[0.1, 0.9]

, min child weight

[1, 10]

, subsample

[0.01, 1]

, colsample by tree

[0.01, 1]

, max depth

[3, 10]

, and gamma

[0, 0.8]

. The dataset used for classification was divided, with an initial 70% used to train the models, the following 10% for validation, and the final 20% for testing. A flowchart is shown in Figure 2 left.

For trajectory forecasting, models were provided with the vessel latitude, longitude, speed over ground, course over ground, and heading, and tasked with forecasting the latitude and longitude of the vessel. The LSTM models were provided with 16 lags (samples considered as inputs for a network) of data as inputs and a batch size of 20, using the Keras library TimeSeriesGenerator to facilitate model training. The number of lags was empirically determined as a trade-off between the memory and computational demands and the accuracy, to give the best results. The training was carried out with 50% of the available data, and 50% was used for testing. Metaheuristics were tasked with selecting optimal LSTM network parameters and architecture. The parameters selected for optimization were, once again, selected due to their significant influence on model performance. The parameters and their respective ranges included the learning rate within the range

[0.0001, 0.01]

, dropout

[0.05, 0.2]

, and a number of training epochs

[30, 50]

. Finally, the network architecture was also optimized, and the number of layers was selected between

[1, 3]

. The number of neurons in each layer was also optimized, and the number of neurons per layer was selected from a

[16, 32]

range.

However, the constraints depended on the number of lags utilized. Neuron constraints were considered in a range of

[l a g s, l a g s \cdot 2]

. A validation set was not employed, and the dataset was grouped into training and testing sets, in a

50 / 50

ratio. A flowchart is shown in Figure 2 right.

Metrics Used for Validation and Comparative Analysis

Two sets of evaluation metrics were used to facilitate experimentation and to provide a fair comparison between the optimization metaheuristics. For the classification experiment, the standard classification metrics, including the accuracy shown in Equation (24), precision Equation (26), recall Equation (25), and

F_{1}

score Equation (27) were recorded and presented. Additionally, Cohen’s Kappa metric shown in Equation (28) was also utilized, due to the ability of the metric to provide a better evaluation of methods for imbalanced data, such as the one utilized in this work.

A c c u r a c y = \frac{T r u e P o s i t i v e s + T r u e N e g a t i v e s}{T r u e P o s i t i v e s + T r u e N e g a t i v e s + F a l s e P o s i t i v e s + F a l s e N e g a t i v e s},

(24)

R e c a l l = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s},

(25)

P r e c i s i o n = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e P o s i t i v e s},

(26)

F_{1} S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l},

(27)

C o h e n^{'} s k a p p a = \frac{O b s e r v e d A g r e e m e n t - E x p e c t e d A g r e e m e n t}{1 - E x p e c t e d A g r e e m e n t},

(28)

For the time series forecasting experiments for the vessel trajectories, regression metrics were utilized. The utilized metrics were the mean absolute error (MAE) shown in Equation (29),

M S E

Equation (30), root mean squared error (

R M S E

) Equation (31), and coefficient of determination (R

^{2}

) Equation (32). Two additional metrics were included in the analysis, the index of alignment (

I o A

) Equation (33) and Euclidean distance error (

E D E

) Equation (34), which was used to determine the distance between the predicted vessel location and the actual location.

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |,

(29)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2},

(30)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}},

(31)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(32)

I o A = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} (| y_{i} - \bar{y} | + | x_{i} - {\hat{y}}_{i} {|)}^{2}},

(33)

where

y_{i}

denotes the observed values,

{\hat{y}}_{i}

the forecast value, and

\bar{y}

the mean of the observed values. Finally, n denotes the number of observed samples. The

E D E

metric helps determine the distance between the predicted value and the actual value, as described in Equation (34)

E D E = \sqrt{{(x_{i} - {\hat{x}}_{i})}^{2} + {(y_{i} - {\hat{y}}_{i})}^{2}},

(34)

where

x_{i}

and

y_{i}

represent the actual vessel coordinates, while

{\hat{x}}_{i}

and

{\hat{y}}_{i}

denote the predicted values for each of the coordinates. To guide the optimization process, the MSE metrics were selected as the objective function, with the goal of the optimization being minimization. The mean value of the EDE was utilized as the indicator function in the same experiment.

5. Experimental Outcomes, Comparative Analysis, and Discussion

The following section presents the outcomes of the two experiments conducted using the respective testing setups described in Section 4. Initially, vessel classification is addressed. The trajectory prediction outcomes are explored in the section following the classification outcomes. The experimental results are then discussed and validated using rigorous statistical tests. The bold font indicates the best results in the tables to follow. Finally, the best attained models are analyzed and the feature importance is presented.

5.1. Experimental Observations and Comparative Analysis

The experiments tackled two problems. First, the classification of vessels was performed. Second, the trajectory of the vessel had to be determined. The contribution was composite, as both types of prediction affect the final decision that a human would make based on the results provided from the two different experimental groups. AIS data were used for both cases, and for both cases, first, error minimization was performed, followed by a second sub-group of experiments that evaluated the Cohen’s kappa coefficient maximization outcomes.

5.2. Experiment 1: Marine Vessel Classification

The first set of experiments included the standard metrics applied for comparison, which were the best, worst, mean, standard deviation, and variance. Table 1 shows the overall performance for the classification error minimization, while Table 2 provides the results for the Cohen’s kappa coefficient for the same problem. For both cases, the proposed solution dominated all categories, resulting in an undisputed performance.

Table 3 provides the results for the precision, recall,

F_{1}

score, and support metrics for the proposed solution.

Table 4 provides the obtained hyperparameter values of the XGBoost models for each metaheuristic optimizer.

Figure 3 displays multiple plots for the error minimization experiments. The objective box plot and convergence are shown, showcasing the speed of the proposed solution. The diversity swarm plot for the metaheuristics is displayed, as well as a precision recall curve and receiver operating characteristics curve. Figure 4 shows the confusion matrix for the error minimization experiments.

Similarly to Table 1 and Table 2, Table 5 displaying Cohen’s kappa maximization and Table 6 for the classification error on Cohen’s kappa maximization indicate the superior and dominant performance of the proposed XG-BPSO. Table 7 provides the results of the same metrics as Table 3 for the Cohen’s kappa experiments.

The results of the optimal hyperparameters using metaheuristic optimizers for Cohen’s kappa maximization are given in Table 8.

Figure 5 displays multiple plots for the kappa indicator maximization experiments. The objective box plot and convergence are given with a diversity swarm plot, as well as a precision recall curve and receiver operating characteristics curve. Figure 6 shows a confusion matrix for the kappa indicator maximization experiments.

5.3. Experiment 2: Marine Trajectory Forecasting

The second type of experiment involved trajectory prediction of the vessel. Multivariate forecasting was performed with (‘LAT’, ‘LON’, ‘SOG’, ‘COG’, ‘Heading’) input features. The target features represented the location of the vessel as (’LAT’, ’LAN’).

A validation set was not employed, and the data was grouped into training and testing sets in a

50 / 50

ratio.

The LSTM-BPSO obtained the best performance with the best metrics, as Table 9 indicates. The second best in the same category was the original PSO. The LSTM-ChOA obtained the lowest value for the worst category, while the LSTM-GA had the best results for mean and median values. The LSTM-BA model had the smallest standard deviation and variance.

The MSE and RMSE were the best with the LSTM-BPSO, and the LSTM-PSO had the best results for the R², MAE, and EDE, as displayed in Table 10.

Table 11 displays the best parameters obtained by all tested metaheuristic models.

The results of the trajectory predictions for all metaheuristics are visualized using box plots for the objective function and

R^{2}

, as well as kernel density estimation (kde) and diversity swarm plots for the same two predictions in Figure 7.

A comparison between the best predictions of the proposed LSTM-BPSO and the basic LSTM-PSO is shown in Figure 8. It can be highlighted that the boosted PSO algorithm’s performance was superior in comparison to the elementary PSO.

5.4. Statistical Evaluation

To further examine the simulation results and determine their statistical significance, we gathered and examined the highest scores attained from 30 independent runs of each metaheuristic approach considered. These data were evaluated as a series of data points. The first step involved deciding the appropriate type of statistical test, more precisely if parametric or non-parametric tests should be employed. To start, we evaluated the suitability of parametric tests by observing the independence, normality, and homoscedasticity of the variance in the data [77]. The independence condition was met, since each individual execution of the metaheuristics commenced by producing a random set of solutions.

The normality condition was evaluated by conducting a Shapiro–Wilk test, with individual problem analysis [78] for all three experiments carried out in this research: marine vessel classification (with error and Cohen’s kappa as objectives) and marine trajectory prediction. A Shapiro–Wilk test was executed independently for each approach considered in all three scenarios, resulting in p-values for each test. In all cases, the obtained p-values were below

0.05

, indicating that it was possible to reject the null hypothesis (H0). Consequently, we can conclude that the obtained results in all three simulations did not follow a normal distribution. The Shapiro–Wilk test outcomes for the three scenarios considered are presented in Table 12.

As the normality assumption was not met, it was not appropriate to employ parametric tests. Therefore, the next step involved utilizing a non-parametric Wilcoxon signed-rank test [79], which was applied to the same data series, consisting of the best values achieved in each run for each metaheuristic.

The provided BPSO was employed as the control algorithm, and a Wilcoxon signed-rank test was executed on the mentioned data series. In all three observed scenarios, the determined p-values were below

0.05

. These results indicated that the suggested BPSO algorithm exhibited statistically significant superiority compared to the competing methods, considering a significance threshold of

α = 0.1

. In the case of

α = 0.05

, the introduced BPSO was significantly better than the other contenders, except BA in the third scenario (trajectory experiment), and that result is marked bold. The overall results of the Wilcoxon signed-rank test are presented in Table 13.

5.5. Top-Performing Model Outcome Interpretation

In modern AI research, model interpretation plays an increasingly important role. Often, understanding why a model has made a certain decision is just as important as the decision itself. While feature importance is often simply used to reduce feature lists or debug complex models, feature importance interpretation can provide valuable insights into the problem being addressed itself.

Several approaches exist in the modern literature for tackling interpretation, and a popular choice among researchers is the use of SHAP [66] to determine feature impacts, using techniques from game theory. Additionally, the XGBoost approach comes with a native method for determining feature importance based on gini impurity [80]. Understanding and comparing the importance can help improve vessel identification, as well as improve future research. The outcomes of both analyses are showcased in Figure 9.

The feature importance analysis suggested that vessel length played an important role in accurate classification for all classes of vessel, followed by vessel width and draft, and mmsi. Dynamic parameters such as speed over ground, heading, and course over ground played a less significant role. This is to be expected, regardless of vessel class, as these parameters can coincide. Similar outcomes were established with an XGBoost importance analysis. However, speed over ground played a larger role. This was likely due to the speed limitations of larger vessels.

Another important part of feature analysis is that features influence different decisions in distinct ways. This is demonstrated in Figure 10 and Figure 11, where the feature dependencies for the top-six features were determined for the cargo and tug vessel classes.

Several interesting observations can be made from Figure 10 and Figure 11. Vessel length played a more significant role in cargo vessel identification than tug boats. Vessel width played a similar role. On the other hand, draft was significantly more important for tug vessels than cargo vessels.

6. Conclusions

This research proposes a novel method of naval vessel classification and trajectory prediction, based on the XGBoost model for classification and LSTM for time-series forecasting. These models were applied to real-world maritime datasets, which included AIS data. The need for such solutions is multi-fold, as it benefits, most importantly, wildlife by directly reducing negative effects on nature. Furthermore, general safety is also improved, as any illegal behavior is harder to conceal when a sophisticated monitoring system is used.

A novel variant of the PSO algorithm was proposed as part of this research, devised to tackle the known drawbacks of the elementary PSO. The novel algorithm was named BPSO and later used to tune the XGBoost and LSTM models. The predictions were made using extensive trip data, including various parameters related to the time gaps between the last position from which the ship’s signal had been picked up. Route optimization for optimal and efficient delivery is an additional benefit of the performed optimization. To validate the achieved improvements, the proposed method was compared to various high-performing algorithms for NP-hard problem optimization. The experiments indicated that the proposed solution was dominant and obtained the best results. The XGBoost model tuned using the BPSO algorithm attained an overall accuracy of 99.72% with respect to vessel classification, while LSTM-BPSO achieved an MSE of 0.000098 on the marine trajectory prediction dataset.

As with all research, certain limitations exist for this work. Given the high computational demands of model optimization, only a limited number of algorithms were considered for evaluation. Furthermore, limited population sizes and a relatively modest number of runs were carried out. Certain practical limitations also exist. The data gathered from AIS systems are often noisy and inconsistent, requiring moderate reprocessing to remove noise.

Future works will focus on further refining the proposed approach, developing methods for accounting for the complexity associated with AIS data, and further improving the accuracy of both vessel classification and trajectory forecasting. Finally, the potential applications of the introduced boosted PSO will be explored in different fields.

Author Contributions

Conceptualization, N.B., M.Z., R.D. and L.J.; methodology, N.B., A.P. and L.J.; software, N.B., M.Z., L.J. and R.D.; validation, A.T., V.S., N.B. and L.J.; formal analysis, A.P. and V.S.; investigation, N.B., M.Z. and A.P.; resources, A.T., A.P. and R.D.; data curation, M.Z., A.T. and A.P.; writing—original draft preparation, R.D., A.T., A.P. and L.J.; writing—review and editing, M.Z., N.B., R.D. and P.S.; visualization, N.B., A.P. and M.Z.; supervision, M.Z., N.B., R.D. and P.S.; project administration, A.T., A.P. and L.J.; funding acquisition, M.Z. and N.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets employed in this study are freely available at the following URLs: https://www.kaggle.com/datasets/eminserkanerdonmez/ais-dataset (accessed on 28 June 2023) and https://marinecadastre.gov/ais/ (accessed on 28 June 2023). Partial source code is available via the following Github URL: https://github.com/nbacanin/AppliedSciences2023 (accessed on 28 June 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Han, X.; Armenakis, C.; Jadidi, M. Modeling vessel behaviours by clustering ais data using optimized dbscan. Sustainability 2021, 13, 8162. [Google Scholar] [CrossRef]
Renso, C.; Bogorny, V.; Tserpes, K.; Matwin, S.; Macêdo, J. Multiple-aspect analysis of semantic trajectories (MASTER). Int. J. Geogr. Inf. Sci. 2021, 35, 763–766. [Google Scholar] [CrossRef]
Xiao, Z.; Fang, H.; Jiang, H.; Bai, J.; Havyarimana, V.; Chen, H.; Jiao, L. Understanding Private Car Aggregation Effect via Spatio-Temporal Analysis of Trajectory Data. IEEE Trans. Cybern. 2023, 53, 2346–2357. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Zhu, F.; Huang, Z.; Wan, J.; Ren, Y. Research on Real-Time Anomaly Detection of Fishing Vessels in a Marine Edge Computing Environment. Mob. Inf. Syst. 2021, 2021, 1–15. [Google Scholar] [CrossRef]
Zheng, Y.; Zhang, Y.; Qian, L.; Zhang, X.; Diao, S.; Liu, X.; Cao, J.; Huang, H. A lightweight ship target detection model based on improved YOLOv5s algorithm. PLoS ONE 2023, 18, e0283932. [Google Scholar] [CrossRef]
Yang, F.; Qiao, Y.; Wei, W.; Wang, X.; Wan, D.; Damaševičius, R.; Woźniak, M. DDTree: A hybrid deep learning model for real-timewaterway depth prediction and smart navigation. Appl. Sci. 2020, 10, 2270. [Google Scholar] [CrossRef] [Green Version]
Zhou, Y.; Daamen, W.; Vellinga, T.; Hoogendoorn, S.P. Ship classification based on ship behavior clustering from AIS data. Ocean. Eng. 2019, 175, 176–187. [Google Scholar] [CrossRef]
Wang, X.; Xiao, Y. A Deep Learning Model for Ship Trajectory Prediction Using Automatic Identification System (AIS) Data. Information 2023, 14, 212. [Google Scholar] [CrossRef]
Qian, L.; Zheng, Y.; Li, L.; Ma, Y.; Zhou, C.; Zhang, D. A new method of inland water ship trajectory prediction based on long short-term memory network optimized by genetic algorithm. Appl. Sci. 2022, 12, 4073. [Google Scholar] [CrossRef]
Zheng, Y.; Li, L.; Qian, L.; Cheng, B.; Hou, W.; Zhuang, Y. Sine-SSA-BP Ship Trajectory Prediction Based on Chaotic Mapping Improved Sparrow Search Algorithm. Sensors 2023, 23, 704. [Google Scholar] [CrossRef]
Chen, B.; Hu, J.; Zhao, Y.; Ghosh, B.K. Finite-time observer based tracking control of uncertain heterogeneous underwater vehicles using adaptive sliding mode approach. Neurocomputing 2022, 481, 322–332. [Google Scholar] [CrossRef]
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T.; et al. Xgboost: Extreme gradient boosting. R Package Version 0.4-2 2015, 1, 1–4. [Google Scholar]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Yang, M.; Wang, Y.; Liang, Y.; Wang, C. A New Approach to System Design Optimization of Underwater Gliders. IEEE/ASME Trans. Mechatron. 2022, 27, 3494–3505. [Google Scholar] [CrossRef]
Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; Dasgupta, S., McAllester, D., Eds.; PMLR: Atlanta, GA, USA, 2013; Volume 28, pp. 1310–1318. [Google Scholar]
Bas, E.; Egrioglu, E.; Kolemen, E. Training simple recurrent deep artificial neural network for forecasting using particle swarm optimization. Granul. Comput. 2022, 7, 411–420. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Yang, X.S.; Slowik, A. Firefly algorithm. In Swarm Intelligence Algorithms; CRC Press: Boca Raton, FL, USA, 2020; pp. 163–174. [Google Scholar]
Mirjalili, S. Genetic algorithm. In Evolutionary Algorithms and Neural Networks; Springer: Berlin/Heidelberg, Germany, 2019; pp. 43–55. [Google Scholar]
Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
Karaboga, D. Artificial bee colony algorithm. Scholarpedia 2010, 5, 6915. [Google Scholar] [CrossRef]
Yang, X.S.; Hossein Gandomi, A. Bat algorithm: A novel approach for global engineering optimization. Eng. Comput. 2012, 29, 464–483. [Google Scholar] [CrossRef] [Green Version]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Khishe, M.; Mosavi, M.R. Chimp optimization algorithm. Expert Syst. Appl. 2020, 149, 113338. [Google Scholar] [CrossRef]
Zivkovic, M.; Bacanin, N.; Tuba, E.; Strumberger, I.; Bezdan, T.; Tuba, M. Wireless Sensor Networks Life Time Optimization Based on the Improved Firefly Algorithm. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 1176–1181. [Google Scholar]
Zivkovic, M.; Bacanin, N.; Zivkovic, T.; Strumberger, I.; Tuba, E.; Tuba, M. Enhanced Grey Wolf Algorithm for Energy Efficient Wireless Sensor Networks. In Proceedings of the 2020 Zooming Innovation in Consumer Technologies Conference (ZINC), Novi Sad, Serbia, 26–27 May 2020; pp. 87–92. [Google Scholar]
Bacanin, N.; Tuba, E.; Zivkovic, M.; Strumberger, I.; Tuba, M. Whale optimization algorithm with exploratory move for wireless sensor networks localization. In Proceedings of the International Conference on Hybrid Intelligent Systems, Bhopal, India, 10–12 December 2019; pp. 328–338. [Google Scholar]
Zivkovic, M.; Zivkovic, T.; Venkatachalam, K.; Bacanin, N. Enhanced Dragonfly Algorithm Adapted for Wireless Sensor Network Lifetime Optimization. In Data Intelligence and Cognitive Informatics; Springer: Berlin/Heidelberg, Germany, 2021; pp. 803–817. [Google Scholar]
Bezdan, T.; Zivkovic, M.; Tuba, E.; Strumberger, I.; Bacanin, N.; Tuba, M. Glioma Brain Tumor Grade Classification from MRI Using Convolutional Neural Networks Designed by Modified FA. In Proceedings of the International Conference on Intelligent and Fuzzy Systems, Istanbul, Turkey, 19–21 July 2020; pp. 955–963. [Google Scholar]
Zivkovic, M.; Bacanin, N.; Antonijevic, M.; Nikolic, B.; Kvascev, G.; Marjanovic, M.; Savanovic, N. Hybrid CNN and XGBoost Model Tuned by Modified Arithmetic Optimization Algorithm for COVID-19 Early Diagnostics from X-ray Images. Electronics 2022, 11, 3798. [Google Scholar] [CrossRef]
Jovanovic, D.; Antonijevic, M.; Stankovic, M.; Zivkovic, M.; Tanaskovic, M.; Bacanin, N. Tuning Machine Learning Models Using a Group Search Firefly Algorithm for Credit Card Fraud Detection. Mathematics 2022, 10, 2272. [Google Scholar] [CrossRef]
Petrovic, A.; Bacanin, N.; Zivkovic, M.; Marjanovic, M.; Antonijevic, M.; Strumberger, I. The AdaBoost Approach Tuned by Firefly Metaheuristics for Fraud Detection. In Proceedings of the 2022 IEEE World Conference on Applied Intelligence and Computing (AIC), Sonbhadra, India, 17–19 June 2022; pp. 834–839. [Google Scholar]
Zivkovic, M.; Bacanin, N.; Venkatachalam, K.; Nayyar, A.; Djordjevic, A.; Strumberger, I.; Al-Turjman, F. COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain. Cities Soc. 2021, 66, 102669. [Google Scholar] [CrossRef]
Zivkovic, M.; Venkatachalam, K.; Bacanin, N.; Djordjevic, A.; Antonijevic, M.; Strumberger, I.; Rashid, T.A. Hybrid Genetic Algorithm and Machine Learning Method for COVID-19 Cases Prediction. In Proceedings of the International Conference on Sustainable Expert Systems: ICSES 2020, Lalitpur, Nepal, 28–29 September 2021; Volume 176, p. 169. [Google Scholar]
Bacanin, N.; Bezdan, T.; Tuba, E.; Strumberger, I.; Tuba, M.; Zivkovic, M. Task scheduling in cloud computing environment by grey wolf optimizer. In Proceedings of the 2019 27th Telecommunications Forum (TELFOR), Belgrade, Serbia, 26–27 November 2019; pp. 1–4. [Google Scholar]
Bezdan, T.; Zivkovic, M.; Tuba, E.; Strumberger, I.; Bacanin, N.; Tuba, M. Multi-objective Task Scheduling in Cloud Computing Environment by Hybridized Bat Algorithm. In Proceedings of the International Conference on Intelligent and Fuzzy Systems, Istanbul, Turkey, 19–21 July 2020; pp. 718–725. [Google Scholar]
Bezdan, T.; Zivkovic, M.; Antonijevic, M.; Zivkovic, T.; Bacanin, N. Enhanced Flower Pollination Algorithm for Task Scheduling in Cloud Computing Environment. In Machine Learning for Predictive Analysis; Springer: Berlin/Heidelberg, Germany, 2020; pp. 163–171. [Google Scholar]
Zivkovic, M.; Bezdan, T.; Strumberger, I.; Bacanin, N.; Venkatachalam, K. Improved Harris Hawks Optimization Algorithm for Workflow Scheduling Challenge in Cloud–Edge Environment. In Computer Networks, Big Data and IoT; Springer: Berlin/Heidelberg, Germany, 2021; pp. 87–102. [Google Scholar]
Bacanin, N.; Zivkovic, M.; Stoean, C.; Antonijevic, M.; Janicijevic, S.; Sarac, M.; Strumberger, I. Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering. Mathematics 2022, 10, 4173. [Google Scholar] [CrossRef]
Stankovic, M.; Antonijevic, M.; Bacanin, N.; Zivkovic, M.; Tanaskovic, M.; Jovanovic, D. Feature Selection by Hybrid Artificial Bee Colony Algorithm for Intrusion Detection. In Proceedings of the 2022 International Conference on Edge Computing and Applications (ICECAA), Tamilnadu, India, 13–15 October 2022; pp. 500–505. [Google Scholar]
Alzaqebah, A.; Aljarah, I.; Al-Kadi, O.; Damaševičius, R. A Modified Grey Wolf Optimization Algorithm for an Intrusion Detection System. Mathematics 2022, 10, 999. [Google Scholar] [CrossRef]
Bezdan, T.; Cvetnic, D.; Gajic, L.; Zivkovic, M.; Strumberger, I.; Bacanin, N. Feature Selection by Firefly Algorithm with Improved Initialization Strategy. In Proceedings of the 7th Conference on the Engineering of Computer Based Systems, Novi Sad, Serbia, 26–27 May 2021; pp. 1–8. [Google Scholar]
Bacanin, N.; Budimirovic, N.; Venkatachalam, K.; Jassim, H.S.; Zivkovic, M.; Askar, S.; Abouhawwash, M. Quasi-reflection learning arithmetic optimization algorithm firefly search for feature selection. Heliyon 2023, 9, e15378. [Google Scholar] [CrossRef]
Bacanin, N.; Stoean, C.; Zivkovic, M.; Rakic, M.; Strulak-Wójcikiewicz, R.; Stoean, R. On the benefits of using metaheuristics in the hyperparameter tuning of deep learning models for energy load forecasting. Energies 2023, 16, 1434. [Google Scholar] [CrossRef]
Stoean, C.; Zivkovic, M.; Bozovic, A.; Bacanin, N.; Strulak-Wójcikiewicz, R.; Antonijevic, M.; Stoean, R. Metaheuristic-Based Hyperparameter Tuning for Recurrent Deep Learning: Application to the Prediction of Solar Energy Generation. Axioms 2023, 12, 266. [Google Scholar] [CrossRef]
Bacanin, N.; Jovanovic, L.; Zivkovic, M.; Kandasamy, V.; Antonijevic, M.; Deveci, M.; Strumberger, I. Multivariate energy forecasting via metaheuristic tuned long-short term memory and gated recurrent unit neural networks. Inf. Sci. 2023, 642, 119122. [Google Scholar] [CrossRef]
Milosevic, S.; Bezdan, T.; Zivkovic, M.; Bacanin, N.; Strumberger, I.; Tuba, M. Feed-Forward Neural Network Training by Hybrid Bat Algorithm. In Proceedings of the Modelling and Development of Intelligent Systems: 7th International Conference, MDIS 2020, Sibiu, Romania, 22–24 October 2020; Revised Selected Papers 7. Springer International Publishing: Cham, Switzerland, 2021; pp. 52–66. [Google Scholar]
Gajic, L.; Cvetnic, D.; Zivkovic, M.; Bezdan, T.; Bacanin, N.; Milosevic, S. Multi-layer Perceptron Training Using Hybridized Bat Algorithm. In Computational Vision and Bio-Inspired Computing; Springer: Berlin/Heidelberg, Germany, 2021; pp. 689–705. [Google Scholar]
Bacanin, N.; Zivkovic, M.; Al-Turjman, F.; Venkatachalam, K.; Trojovskỳ, P.; Strumberger, I.; Bezdan, T. Hybridized sine cosine algorithm with convolutional neural networks dropout regularization application. Sci. Rep. 2022, 12, 6302. [Google Scholar] [CrossRef] [PubMed]
Bacanin, N.; Stoean, C.; Zivkovic, M.; Jovanovic, D.; Antonijevic, M.; Mladenovic, D. Multi-Swarm Algorithm for Extreme Learning Machine Optimization. Sensors 2022, 22, 4204. [Google Scholar] [CrossRef] [PubMed]
Jovanovic, L.; Jovanovic, D.; Bacanin, N.; Jovancai Stakic, A.; Antonijevic, M.; Magd, H.; Thirumalaisamy, R.; Zivkovic, M. Multi-Step Crude Oil Price Prediction Based on LSTM Approach Tuned by Salp Swarm Algorithm with Disputation Operator. Sustainability 2022, 14, 14616. [Google Scholar] [CrossRef]
Bacanin, N.; Sarac, M.; Budimirovic, N.; Zivkovic, M.; AlZubi, A.A.; Bashir, A.K. Smart wireless health care system using graph LSTM pollution prediction and dragonfly node localization. Sustain. Comput. Inform. Syst. 2022, 35, 100711. [Google Scholar]
Jovanovic, L.; Jovanovic, G.; Perisic, M.; Alimpic, F.; Stanisic, S.; Bacanin, N.; Zivkovic, M.; Stojic, A. The Explainable Potential of Coupling Metaheuristics-Optimized-XGBoost and SHAP in Revealing VOCs’ Environmental Fate. Atmosphere 2023, 14, 109. [Google Scholar] [CrossRef]
Jovanovic, G.; Perisic, M.; Bacanin, N.; Zivkovic, M.; Stanisic, S.; Strumberger, I.; Alimpic, F.; Stojic, A. Potential of Coupling Metaheuristics-Optimized-XGBoost and SHAP in Revealing PAHs Environmental Fate. Toxics 2023, 11, 394. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, R.; Fang, Y. LSTM network based on on antlion optimization and its application in flight trajectory prediction. In Proceedings of the 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Xi’an, China, 25–27 May 2018; pp. 1658–1662. [Google Scholar]
Xiao, Z.; Li, P.; Havyarimana, V.; Hassana, G.M.; Wang, D.; Li, K. GOI: A novel design for vehicle positioning and trajectory prediction under urban environments. IEEE Sens. J. 2018, 18, 5586–5594. [Google Scholar] [CrossRef]
Liu, J.; Shi, G.; Zhu, K. Vessel trajectory prediction model based on AIS sensor data and adaptive chaos differential evolution support vector regression (ACDE-SVR). Appl. Sci. 2019, 9, 2983. [Google Scholar] [CrossRef] [Green Version]
Suo, Y.; Chen, W.; Claramunt, C.; Yang, S. A ship trajectory prediction framework based on a recurrent neural network. Sensors 2020, 20, 5133. [Google Scholar] [CrossRef]
Cacchiani, V.; Ceschia, S.; Mignardi, S.; Buratti, C. Metaheuristic Algorithms for UAV Trajectory Optimization in Mobile Networks. In Proceedings of the Metaheuristics: 14th International Conference, MIC 2022, Syracuse, Italy, 11–14 July 2022; Springer: Berlin/Heidelberg, Germany, 2023; pp. 30–44. [Google Scholar]
Hofmann, C.; Topputo, F. Rapid low-thrust trajectory optimization in deep space based on convex programming. J. Guid. Control Dyn. 2021, 44, 1379–1388. [Google Scholar] [CrossRef]
Wood, D.A. Hybrid bat flight optimization algorithm applied to complex wellbore trajectories highlights the relative contributions of metaheuristic components. J. Nat. Gas Sci. Eng. 2016, 32, 211–221. [Google Scholar]
Farzipour, A.; Elmi, R.; Nasiri, H. Detection of Monkeypox Cases Based on Symptoms Using XGBoost and Shapley Additive Explanations Methods. Diagnostics 2023, 13, 2391. [Google Scholar] [CrossRef]
Bhandari, M.; Yogarajah, P.; Kavitha, M.S.; Condell, J. Exploring the Capabilities of a Lightweight CNN Model in Accurately Identifying Renal Abnormalities: Cysts, Stones, and Tumors, Using LIME and SHAP. Appl. Sci. 2023, 13, 3125. [Google Scholar] [CrossRef]
Fatahi, R.; Nasiri, H.; Homafar, A.; Khosravi, R.; Siavoshi, H.; Chehreh Chelgani, S. Modeling operational cement rotary kiln variables with explainable artificial intelligence methods–A “conscious lab” development. Part. Sci. Technol. 2023, 41, 715–724. [Google Scholar] [CrossRef]
Dobrojevic, M.; Zivkovic, M.; Chhabra, A.; Sani, N.S.; Bacanin, N.; Amin, M.M. Addressing Internet of Things security by enhanced sine cosine metaheuristics tuned hybrid machine learning model and results interpretation based on SHAP approach. PeerJ Comput. Sci. 2023, 9, e1405. [Google Scholar] [PubMed]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
Molnar, C. Interpretable Machine Learning; Lulu. com: Morrisville, NC, USA, 2020. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
Stojić, A.; Stanić, N.; Vuković, G.; Stanišić, S.; Perišić, M.; Šoštarić, A.; Lazić, L. Explainable extreme gradient boosting tree-based prediction of toluene, ethylbenzene and xylene wet deposition. Sci. Total Environ. 2019, 653, 140–147. [Google Scholar] [CrossRef]
Jovanovic, L.; Jovanovic, D.; Antonijevic, M.; Nikolic, B.; Bacanin, N.; Zivkovic, M.; Strumberger, I. Improving Phishing Website Detection Using a Hybrid Two-level Framework for Feature Selection and XGBoost Tuning. J. Web Eng. 2023, 22, 543–574. [Google Scholar] [CrossRef]
Mohamed, A.W.; Hadi, A.A.; Mohamed, A.K.; Awad, N.H. Evaluating the performance of adaptive gainingsharing knowledge based algorithm on cec 2020 benchmark problems. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Yu, K.; Liang, J.; Qu, B.; Chen, X.; Wang, H. Parameters identification of photovoltaic models using an improved JAYA optimization algorithm. Energy Convers. Manag. 2017, 150, 742–753. [Google Scholar] [CrossRef]
Chechkin, A.V.; Metzler, R.; Klafter, J.; Gonchar, V.Y. Introduction to the theory of Lévy flights. In Anomalous Transport: Foundations and Applications; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2008; pp. 129–162. [Google Scholar]
Yang, X.S.; Deb, S. Cuckoo search via Lévy flights. In Proceedings of the 2009 World congress on nature & biologically inspired computing (NaBIC), Coimbatore, India, 9–11 December 2009; pp. 210–214. [Google Scholar]
Kaidi, W.; Khishe, M.; Mohammadi, M. Dynamic levy flight chimp optimization. Knowl.-Based Syst. 2022, 235, 107625. [Google Scholar] [CrossRef]
Heidari, A.A.; Pahlavani, P. An efficient modified grey wolf optimizer with Lévy flight for optimization tasks. Appl. Soft Comput. 2017, 60, 115–134. [Google Scholar] [CrossRef]
LaTorre, A.; Molina, D.; Osaba, E.; Poyatos, J.; Del Ser, J.; Herrera, F. A prescription of methodological guidelines for comparing bio-inspired optimization algorithms. Swarm Evol. Comput. 2021, 67, 100973. [Google Scholar] [CrossRef]
Shapiro, S.S.; Francia, R. An approximate analysis of variance test for normality. J. Am. Stat. Assoc. 1972, 67, 215–216. [Google Scholar] [CrossRef]
Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 196–202. [Google Scholar]
Yuan, Y.; Wu, L.; Zhang, X. Gini-Impurity index analysis. IEEE Trans. Inf. Forensics Secur. 2021, 16, 3154–3169. [Google Scholar] [CrossRef]

Figure 1. Class distribution bar chart and pie diagram.

Figure 2. Detailed flowchart of the proposed framework—XGBoost (left) and LSTM (right).

Figure 3. Vessel classification (error minimization plots) result visualization. (a) Objective box plot; (b) Objective convergence; (c) Precision recall curve; (d) Receiver operating characteristic curve; (e) Data diversity.

Figure 4. Vessel classification confusion matrix of the proposed XG-BPSO approach.

Figure 5. Vessel classification (kappa maximization plots) result visualization. (a) Objective box plot; (b) Objective convergence; (c) Precision recall curve; (d) Receiver operating characteristic curve; (e) Data diversity.

Figure 6. Vessel classification (Cohen’s kappa) confusion matrix of the proposed XG-BPSO approach.

Figure 7. Vessel trajectory result visualization. (a) Objective box plot; (b)

R^{2}

box plot; (c) Objective kde plot; (d)

R^{2}

kde plot; (e) Objective swarm diversity; (f)

R^{2}

data diversity.

Figure 7. Vessel trajectory result visualization. (a) Objective box plot; (b)

R^{2}

box plot; (c) Objective kde plot; (d)

R^{2}

kde plot; (e) Objective swarm diversity; (f)

R^{2}

data diversity.

Figure 8. Vessel trajectory best predictions (denormalized)—LSTM-BPSO vs. LSTM-PSO. (a) LSTM-BPSO; (b) LSTM-PSO.

Figure 9. Feature importances determined using XGBoost and SHAP.

Figure 10. Feature dependence plots for cargo vessel types.

Figure 11. Feature dependence plots for tug vessel types.

Table 1. Overall objective function (classification error) results—classification error minimization experiment.

Method	Best	Worst	Mean	Median	Std	Var
XG-BPSO	0.002789	0.003470	0.003100	0.003130	0.000202	$4.09 \times 10^{- 8}$
XG-PSO	0.003096	0.005375	0.004010	0.003929	0.000739	$5.47 \times 10^{- 7}$
XG-GA	0.004354	0.015614	0.007037	0.005511	0.003554	$1.26 \times 10^{- 5}$
XG-ABC	0.004898	0.007586	0.005736	0.005477	0.000817	$6.67 \times 10^{- 7}$
XG-BA	0.003368	0.005171	0.004116	0.004184	0.000559	$3.13 \times 10^{- 7}$
XG-WOA	0.003198	0.004762	0.004065	0.004133	0.000463	$2.14 \times 10^{- 7}$
XG-HHO	0.003674	0.007076	0.004915	0.004286	0.001205	$1.45 \times 10^{- 6}$
XG-ChOA	0.003062	0.009899	0.004720	0.004133	0.002032	$4.13 \times 10^{- 6}$

Table 2. Overall indicator function (Cohen’s kappa) results—classification error minimization experiment.

Method	Best	Worst	Mean	Median	Std	Var
XG-BPSO	0.995391	0.994267	0.994878	0.994829	0.000335	$1.12 \times 10^{- 7}$
XG-PSO	0.994886	0.991114	0.993374	0.993507	0.001223	$1.50 \times 10^{- 6}$
XG-GA	0.992803	0.974130	0.988359	0.990888	0.005892	$3.47 \times 10^{- 5}$
XG-ABC	0.991899	0.987456	0.990516	0.990946	0.001351	$1.83 \times 10^{- 6}$
XG-BA	0.994435	0.991452	0.993197	0.993085	0.000925	$8.56 \times 10^{- 7}$
XG-WOA	0.994717	0.992126	0.993282	0.993169	0.000766	$5.87 \times 10^{- 7}$
XG-HHO	0.993928	0.988296	0.991873	0.992917	0.001995	$3.98 \times 10^{- 6}$
XG-ChOA	0.994942	0.983617	0.992197	0.993169	0.003366	$1.13 \times 10^{- 5}$

Table 3. Detailed per-class metrics for the best performing XG-BPSO metaheuristics—classification error minimization experiment.

Class	Precision	Recall	$F_{1}$ -Score	Support
Cargo	0.998406	0.999527	0.998966	16921
Dredging	0.987097	0.995662	0.991361	461
Fishing	1.000000	0.994460	0.997222	1083
HSC	0.985207	0.994003	0.989599	335
Law enforcement	1.000000	1.000000	1.000000	136
Military	0.987085	0.988909	0.987996	541
Passenger	0.998011	0.995370	0.996689	1512
Pilot	1.000000	0.996139	0.998066	259
Pleasure	1.000000	1.000000	1.000000	7
Port tender	1.000000	1.000000	1.000000	5
Reserved	0.937500	0.849057	0.891089	53
SAR	1.000000	0.933333	0.965517	45
Sailing	0.900000	0.642857	0.750000	14
Tanker	0.998580	0.997730	0.998155	7050
Towing	0.880000	0.904110	0.891892	73
Towing long/wide	0.875000	0.980000	0.924528	50
Tug	0.995272	0.988263	0.991755	852
accuracy	0.997211	0.997211	0.997211	0.9972106
macro avg	0.973068	0.956438	0.963108	29397
weighted avg	0.997223	0.997212	0.997194	29397

Table 4. Best obtained models’ hyper-parameters multiclass classification—classification error minimization experiment.

Method	Learning Rate	Min Child Weight	Subsample	Colsample by Tree	Max Depth	Gamma
XG-BPSO	0.593383	1.000000	0.984577	1.000000	10	0.000000
XG-PSO	0.716738	2.148387	1.000000	0.931689	10	0.239401
XG-GA	0.741573	1.000000	0.944507	0.685084	9	0.382662
XG-ABC	0.709378	1.860859	0.684326	0.710326	10	0.000000
XG-BA	0.724748	3.134685	1.000000	1.000000	10	0.800000
XG-WOA	0.770394	3.170319	1.000000	1.000000	10	0.279516
XG-HHO	0.688403	2.552526	1.000000	1.000000	10	0.751155
XG-ChOA	0.642902	1.146574	1.000000	0.931855	10	0.637534

Table 5. Overall objective function (Cohen’s kappa) results—Cohen’s kappa maximization experiment.

Method	Best	Worst	Mean	Median	Std	Var
XG-BPSO	0.995561	0.993814	0.994878	0.994970	0.000531	$2.82 \times 10^{- 7}$
XG-PSO	0.994379	0.990388	0.993022	0.993561	0.001328	$1.76 \times 10^{- 6}$
XG-GA	0.994098	0.982394	0.990003	0.990381	0.003349	$1.12 \times 10^{- 5}$
XG-ABC	0.992582	0.983179	0.989497	0.990521	0.002917	$8.51 \times 10^{- 6}$
XG-BA	0.994660	0.990496	0.993149	0.993396	0.001194	$1.42 \times 10^{- 6}$
XG-WOA	0.994660	0.989709	0.993029	0.993369	0.001442	$2.08 \times 10^{- 6}$
XG-HHO	0.994715	0.991679	0.993612	0.993901	0.000850	$7.23 \times 10^{- 7}$
XG-ChOA	0.994379	0.988417	0.992685	0.993227	0.001796	$3.23 \times 10^{- 6}$

Table 6. Overall indicator function (Classification Error) results—Cohen’s kappa maximization experiment.

Method	Best	Worst	Mean	Median	Std	Var
XG-BPSO	0.002687	0.003742	0.003100	0.003045	0.000321	$1.03 \times 10^{- 7}$
XG-PSO	0.003402	0.005817	0.004222	0.003895	0.000803	$6.46 \times 10^{- 7}$
XG-GA	0.003572	0.010647	0.006047	0.005817	0.002024	$4.10 \times 10^{- 6}$
XG-ABC	0.004490	0.010171	0.006353	0.005732	0.001763	$3.11 \times 10^{- 6}$
XG-BA	0.003232	0.005749	0.004146	0.003997	0.000721	$5.21 \times 10^{- 7}$
XG-WOA	0.003232	0.006225	0.004218	0.004014	0.000871	$7.59 \times 10^{- 7}$
XG-HHO	0.003198	0.005035	0.003865	0.003691	0.000514	$2.64 \times 10^{- 7}$
XG-ChOA	0.003402	0.007008	0.004426	0.004099	0.001086	$1.18 \times 10^{- 6}$

Table 7. Detailed per-class metrics for the best performing XG-BPSO metaheuristics—Cohen’s kappa maximization experiment.

Class	Precision	Recall	$F_{1}$ -Score	Support
Cargo	0.998818	0.999114	0.998966	16921
Dredging	0.984979	0.995662	0.990291	461
Fishing	1.000000	0.995383	0.997686	1083
HSC	0.979412	0.994030	0.986667	335
Law enforcement	1.000000	0.985294	0.992593	136
Military	0.992647	0.998152	0.995392	541
Passenger	0.999336	0.996032	0.997681	1512
Pilot	0.992337	1.000000	0.996154	259
Pleasure	1.000000	1.000000	1.000000	7
Port tender	1.000000	1.000000	1.000000	5
Reserved	0.921569	0.886792	0.903846	53
SAR	1.000000	0.888889	0.941176	45
Sailing	1.000000	0.714286	0.833333	14
Tanker	0.997732	0.998298	0.998015	7050
Towing	0.942857	0.904110	0.923077	73
Towing long/wide	0.842105	0.960000	0.897196	50
Tug	0.994097	0.988263	0.991171	852
accuracy	0.997313	0.997313	0.997313	0.997313
macro avg	0.979170	0.959077	0.967250	29397
weighted avg	0.997346	0.997313	0.997303	29397

Table 8. Best obtained model hyper-parameter multiclass classification—Cohen’s kappa maximization experiment.

Method	Learning Rate	Min Child Weight	Subsample	Colsample by Tree	Max Depth	Gamma
XG-BPSO	0.828630	2.737313	1.000000	0.868831	10	0.071057
XG-PSO	0.744281	3.435928	1.000000	0.872910	10	0.785282
XG-GA	0.730659	3.499590	1.000000	0.872590	10	0.256810
XG-ABC	0.681233	1.000000	1.000000	0.903165	9	0.550955
XG-BA	0.900000	6.501873	1.000000	0.871536	10	0.000000
XG-WOA	0.813991	3.414476	1.000000	0.860721	10	0.800000
XG-HHO	0.707112	2.440472	1.000000	1.000000	10	0.260687
XG-ChOA	0.734595	1.990239	1.000000	0.874517	10	0.684537

Table 9. Overall objective function results—trajectory forecasting experiment.

Method	Best	Worst	Mean	Median	Std	Var
LSTM-BPSO	$3.30 \times 10^{- 5}$	$8.75 \times 10^{- 5}$	$5.10 \times 10^{- 5}$	$3.83 \times 10^{- 5}$	$1.90 \times 10^{- 5}$	$3.62 \times 10^{- 10}$
LSTM-PSO	$3.83 \times 10^{- 5}$	$1.04 \times 10^{- 4}$	$6.72 \times 10^{- 5}$	$6.11 \times 10^{- 5}$	$2.84 \times 10^{- 5}$	$8.06 \times 10^{- 10}$
LSTM-GA	$5.82 \times 10^{- 5}$	$1.25 \times 10^{- 4}$	$1.01 \times 10^{- 4}$	$1.14 \times 10^{- 4}$	$2.83 \times 10^{- 5}$	$8.03 \times 10^{- 10}$
LSTM-ABC	$5.13 \times 10^{- 5}$	$9.17 \times 10^{- 5}$	$6.85 \times 10^{- 5}$	$5.75 \times 10^{- 5}$	$1.79 \times 10^{- 5}$	$3.20 \times 10^{- 10}$
LSTM-BA	$4.86 \times 10^{- 5}$	$6.04 \times 10^{- 5}$	$5.28 \times 10^{- 5}$	$5.22 \times 10^{- 5}$	$4.80 \times 10^{- 6}$	$2.30 \times 10^{- 11}$
LSTM-WOA	$6.45 \times 10^{- 5}$	$1.02 \times 10^{- 4}$	$8.36 \times 10^{- 5}$	$8.65 \times 10^{- 5}$	$1.48 \times 10^{- 5}$	$2.19 \times 10^{- 10}$
LSTM-HHO	$5.17 \times 10^{- 5}$	$8.29 \times 10^{- 5}$	$6.73 \times 10^{- 5}$	$6.04 \times 10^{- 5}$	$1.21 \times 10^{- 5}$	$1.47 \times 10^{- 10}$
LSTM-ChOA	$5.22 \times 10^{- 5}$	$1.01 \times 10^{- 4}$	$7.22 \times 10^{- 5}$	$6.83 \times 10^{- 5}$	$1.82 \times 10^{- 5}$	$3.32 \times 10^{- 10}$

Table 10. Detailed metrics for the best performing models—trajectory forecasting experiment.

Method	R²	MAE	MSE	RMSE	IoA	EDE
LSTM-BPSO	0.997921	0.007313	0.000098	0.009985	0.999769	0.012385
LSTM-PSO	0.997912	0.006937	0.000099	0.009987	0.999774	0.011048
LSTM-GA	0.997311	0.008302	0.000128	0.011299	0.999708	0.013484
LSTM-ABC	0.997361	0.008045	0.000126	0.011216	0.999711	0.012139
LSTM-BA	0.997554	0.007366	0.000117	0.010794	0.999735	0.011591
LSTM-WOA	0.997002	0.009676	0.000142	0.011930	0.999672	0.015390
LSTM-HHO	0.997090	0.009045	0.000139	0.011795	0.999679	0.013913
LSTM-ChOA	0.997654	0.008605	0.000111	0.010547	0.999744	0.013513

Table 11. Best obtained model hyper-parameters—trajectory forecasting experiment.

Method	Learning Rate	Dropout	Epochs	Layers	Neurons Layer 1	Neurons Layer 2
LSTM-BPSO	0.000656	0.005000	50	1	27	/
LSTM-PSO	0.000125	0.020000	40	1	31	/
LSTM-GA	0.006729	0.060654	39	1	22	/
LSTM-ABC	0.004661	0.105701	44	2	16	24
LSTM-BA	0.000086	0.005000	40	1	32	/
LSTM-WOA	0.003918	0.154873	34	1	27	/
LSTM-HHO	0.001845	0.093174	41	1	26	/
LSTM-ChOA	0.002802	0.144773	46	1	20	/

Table 12. Shapiro–Wilk normality tests.

Problem	BPSO	PSO	GA	ABC	BA	WOA	HHO	ChOA
Vessel error	0.032	0.041	0.046	0.018	0.021	0.033	0.035	0.040
Vessel kappa	0.043	0.015	0.044	0.031	0.032	0.038	0.021	0.035
Trajectory	0.011	0.003	0.015	0.009	0.019	0.006	0.010	0.013

Table 13. Wilcoxon signed-rank test values, exhibiting p-values for all three experiments (BPSO vs. others).

Problem/p-Values	PSO	GA	ABC	BA	WOA	HHO	ChOA
Vessel error	0.042	0.003	0.018	0.037	0.04	0.027	0.031
Vessel kappa	0.043	0.027	0.023	0.045	0.044	0.047	0.036
Trajectory	0.045	0.007	0.041	0.054	0.024	0.044	0.032

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Petrovic, A.; Damaševičius, R.; Jovanovic, L.; Toskovic, A.; Simic, V.; Bacanin, N.; Zivkovic, M.; Spalević, P. Marine Vessel Classification and Multivariate Trajectories Forecasting Using Metaheuristics-Optimized eXtreme Gradient Boosting and Recurrent Neural Networks. Appl. Sci. 2023, 13, 9181. https://doi.org/10.3390/app13169181

AMA Style

Petrovic A, Damaševičius R, Jovanovic L, Toskovic A, Simic V, Bacanin N, Zivkovic M, Spalević P. Marine Vessel Classification and Multivariate Trajectories Forecasting Using Metaheuristics-Optimized eXtreme Gradient Boosting and Recurrent Neural Networks. Applied Sciences. 2023; 13(16):9181. https://doi.org/10.3390/app13169181

Chicago/Turabian Style

Petrovic, Aleksandar, Robertas Damaševičius, Luka Jovanovic, Ana Toskovic, Vladimir Simic, Nebojsa Bacanin, Miodrag Zivkovic, and Petar Spalević. 2023. "Marine Vessel Classification and Multivariate Trajectories Forecasting Using Metaheuristics-Optimized eXtreme Gradient Boosting and Recurrent Neural Networks" Applied Sciences 13, no. 16: 9181. https://doi.org/10.3390/app13169181

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Marine Vessel Classification and Multivariate Trajectories Forecasting Using Metaheuristics-Optimized eXtreme Gradient Boosting and Recurrent Neural Networks

Abstract

1. Introduction

2. Background

2.1. Extreme Gradient Boosting Algorithm

2.2. Long-Short Term Memory Model

Hyperparameters of LSTM

2.3. Metaheuristic Methods and Related Works

2.4. Shapley Additive Explanations

3. Introduced Modified Metaheuristics

3.1. PSO

3.2. Modified PSO Approach

3.2.1. Chaotic Elite Learning

3.2.2. Lévy Flight

3.3. Modified PSO Pseudo-Code

4. Experimental Environment and Preliminaries

4.1. Datasets and Preprocessing

4.2. Experimental Setup

Metrics Used for Validation and Comparative Analysis

5. Experimental Outcomes, Comparative Analysis, and Discussion

5.1. Experimental Observations and Comparative Analysis

5.2. Experiment 1: Marine Vessel Classification

5.3. Experiment 2: Marine Trajectory Forecasting

5.4. Statistical Evaluation

5.5. Top-Performing Model Outcome Interpretation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI