*Article* **A Pilot Study of Stacked Autoencoders for Ship Mode Classification**

**Ji-Yoon Kim <sup>1</sup> and Jin-Seok Oh 2,\***


**Abstract:** With the evolution of the shipping market, artificial intelligence research using ship data is being actively conducted. Smart ships and reducing ship greenhouse gas emissions are among the most actively researched topics in the maritime transport industry. Owing to the massive advances in information and communications technology, the internet of things, and big data technologies, smart ships have emerged as a very promising proposition. Numerous methodologies and network architectures can smoothly collect data from ships that are currently in operation, as is currently done in research on reducing ship fuel consumption by deep learning or conventional methods. Many extensive studies of stacked autoencoders have been carried out in the past few years. However, prior studies have not addressed the development of algorithms or deep learning-based models to classify the operating states of ships. In this paper, we propose for the first time a deep learning-based stacked autoencoder model that can classify the operating state of a ship broadly into the categories of At Sea, Stand By, and In Port, using actual ship power load data. In order to maximize the model's performance, the stacked autoencoder architecture, number of hidden layers, and number of neurons contained in each layer were measured by performance metrics such as true positive rate, false positive rate, Matthews correlation coefficient, and accuracy. It was found that the model's performance was not always improved by increasing its complexity, so the feasibility of developing and utilizing an efficient model was verified by comparing it to real data. The best-performing model had a (5–128) structure with latent layer size 9. It achieved a true positive rate of 0.9035, a false positive rate of 0.0541, a Matthews correlation coefficient of 0.9054, and an accuracy of 0.9612, clearly demonstrating that deep learning can be used to analyze ship operating modes.

**Keywords:** ship mode; autoencoder; ship mode classification; deep-learning model

#### **1. Introduction**

In the maritime transport industry, smart ships and reducing ship greenhouse gas emissions have been actively researched [1]. Smart ships have emerged as information and communications technology, the internet of things, and big data technologies have advanced. Different from a conventional ship, a smart ship is characterized by its ability to use data collected by sensors installed within the ship to self-navigate or to provide appropriate information to assist in the decisions of crew members operating the ship [2]. Studies on reducing ship greenhouse gas emissions have mainly focused on reducing ship fuel consumption and eco-friendly ships that do not use oil.

Research on smart ships has been actively pursued, and various methodologies and network architectures that can smoothly collect data from ships in operation have been proposed. This research has included areas such as big data collection systems [3], cyber security considerations [4], data management to reduce learning costs [5], framework structures for index systems [6], surveys of architectures and applications [7], and priority items for smart shipping [8]. Furthermore, various data on actual ships is being collected,

**Citation:** Kim, J.-Y.; Oh, J.-S. A Pilot Study of Stacked Autoencoders for Ship Mode Classification. *Appl. Sci.* **2023**, *13*, 5491. https://doi.org/ 10.3390/app13095491

Academic Editors: Enjin Zhao, Hao Qin and Lin Mu

Received: 20 February 2023 Revised: 17 April 2023 Accepted: 26 April 2023 Published: 28 April 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

which is used in research on reducing ship fuel consumption by deep learning or conventional methods to perform knowledge-free path planning with reinforcement learning [9], energy-saving auto path planning algorithms [10], energy-saving management systems for smart ships [11], power scheduling for saving energy with reinforcement learning [12], and forecasting ship fuel consumption [13]. However, the application of ship operation mode classification remains unresearched.

Due to the characteristics of ship operations, the operating state of a ship can be broadly classified as At Sea, Stand By, or In Port. In the At Sea state, all of the devices on the ship are connected and powered, and the load changes in each device are small. Stand By refers to the state in which the ship is entering or exiting a port, and it is characterized by large fluctuations in the total power consumption due to changes in the ship speed and the use of auxiliary devices. Lastly, In Port refers to the operating state in which a ship has entered a port and cargo is being loaded or unloaded from the ship. In this state, fewer auxiliary devices are being powered on the ship, and total power consumption and power fluctuations are low. The power fluctuation characteristics of each ship type are as follows:


The operating states of ships, which are presently being acquired in large quantities, have not, however, been adequately researched. As such, researchers are faced with the problem of needing to label data manually based on ship voyage information to use the collected ship data. Research on ship state classification models that can classify the operating states of ships is required to overcome this issue.

Prior studies have not addressed the development of algorithms or deep learningbased models to classify the operating states of ships. Autoencoders are used in handwriting recognition [14], anomaly detection [15], fault diagnosis [16], and fraud diagnosis [17] and produce better results in comparison with existing algorithms. Thus, we selected the autoencoder model for classifying ship data.

However, the performance of the stacked autoencoder model depends on proper control of the components. When utilizing an autoencoder-based classification model, model design considerations include:


In order to address these issues, we conducted comparative experiments on the structure of the classification model, the appropriate values for the number of hidden layers and the number of neurons in each layer, and changes in the size of latent layers. We find the best model to classify the ship's operating state as either At Sea, Stand By, or In Port using actual ship power load data.

This paper makes the following contributions: First, the structure of the first stacked autoencoder model using actual ship data is presented. Second, the performance change of the model according to the components of the stacked autoencoder was investigated. In particular, since there is no previous study that uses actual ship data, we design and perform performance comparison experiments of classification models according to structural changes of the stacked autoencoder. Third, the experimental results are analyzed.

#### **2. Related Works**

In the past several years, many studies have applied autoencoder models to solve practical problems. An autoencoder [18] is a learning neural network model that approximates input and output values to reconstruct them the same, and the main purpose of the autoencoder is to learn informative representations of data using an unsupervised method. Types of autoencoders include the stacked autoencoder, sparse autoencoder, denoising autoencoder, contractive autoencoder, and variational autoencoder [19].

The stacked autoencoder can learn efficiently to create robust features from training data. Research has been conducted on the benefits of stacked autoencoders to solve their problems. Ghosh et al. [20] used a stacked autoencoder model to classify human emotional data and achieved good results in categorizing human emotions with a spectrogram dataset. Ghosh's approach was able to produce better results compared to traditional methods, which could not distinguish between happy and angry people.

Ambaw et al. [21] compared different conventional neural networks, support vector machines, and stacked autoencoders on the recognition of continuous phase frequency-shift keying under carrier frequency offset, noisy, and fast-fading channel conditions with the based model. In this study, the three features selected for recognition were the approximate entropy of the received signal, the approximate entropy of the received signal's phase, and the approximate entropy of the instantaneous frequency of the received signal. It was found that the stacked autoencoder performed better than support vector machines and traditional neural networks; a stacked autoencoder model can give a better accuracy result for most signal to noise ratios.

Singh et al. [22] proposed a stacked autoencoder model to reduce complexity and processing time for detecting epilepsy. The model classified epileptic data into normal, ictal, and preictal. He selected machine learning algorithms such as Bayes Net, Naïve Bayes, multilayer perceptron, radial basis function neural networks, and the C4.5 decision tree classifier as comparison models. He proved that the stacked autoencoder model had the best performance score with the least processing time.

Law et al. [23] suggested using a cascade of two types of networks, stacked autoencoders and extreme learning machines, for multi-label classification to enhance a stacked autoencoder's performance. The proposed model was compared with eleven other algorithms with seven datasets. However, she claimed that the model had promising performance.

Aouedi et al. [24] introduced a stacked sparse autoencoder model to integrate feature extraction and classification processes. The model uses denoising and a dropout technique to enhance feature extraction performance and prevent overfitting. It was proven that the model produced a better output than conventional models.

Deperlioglu [25] built a stacked autoencoder classification model for heart sound analysis. Traditional methods use data transforms to get the S1 and S2 segments of heart sounds. Deperlioglu's novel approach was to utilize only a stacked autoencoder model to get segments of heart sounds for direct classification. This model was compared with conventional algorithms. The proposed model's performance was similar to prior models. According to this study, a stacked autoencoder can be used in the medical field with efficient and effective classification results.

Gokhale et al. [26] compared a proposed stacked autoencoder model with seven previously established algorithms using ten datasets to find the key genes for cancer. Traditional gene selection approaches using statistical or feature selection methods have accuracy problems. However, the proposed stacked autoencoder-based framework outperformed conventional methods in this study.

Arafa et al. [27] introduced a reduced noise-autoencoder for solving the problem of imbalanced data in genomic datasets. Arafa's approach was able to solve the dimensionality problem with the stacked autoencoder with feature reduction and create new low-dimensional data. In addition, the accuracy score was improved by more than eight percent.

#### **3. Theoretical Background**

#### *3.1. Autoencoder*

An autoencoder consists of an encoder, a latent layer, and a decoder, as shown in Figure 1. The encoder is called a recognition network and extracts the features of the original data entered as input. The layer that stores the extracted features during this task is called the latent layer. The decoder is known as a generative network, and it converts the features into output. Through this process, the autoencoder can reorganize the core information in the input data.

**Figure 1.** Autoencoder.

The encoder of the autoencoder can be defined as

$$h = \sigma(\mathcal{W}\_e \mathfrak{x} + b\_e)$$

Here, *x* is the input data, and *We* and *be* are the *e*th weight value matrix and bias vectors, respectively. *σ* is the activation function. *h* is the encoder output. The decoder of the autoencoder can be expressed as

$$\pounds = \sigma(\mathcal{W}\_d h + b\_d).$$

In the decoder, the encoder output *h* is used as an input variable. *Wd* and *bd* are the *d*th weight value matrix and bias vectors, respectively. *σ* is the activation function. *x*ˆ is the decoder output.

The autoencoder learns in order to make the decoder's output value as similar to the input value as possible. Therefore, minimizing the difference between the input value and the decoder output value by adjusting the parameters during the training of the autoencoder model is important. As such, selecting a loss function that is suitable for the goals of the appropriate model is also critical. If the mean square error is used as the loss function, it can be expressed as follows:

$$L(\mathfrak{x}, \mathfrak{k}) = \frac{1}{N} \sum\_{i=1}^{N} (\mathfrak{k} - \mathfrak{x})^2$$

Here, *x* is the input value, and *x*ˆ is the decoder output value. *N* is the *N*th term.

#### *3.2. SAE*

An SAE [28], also known as a deep autoencoder, is organized as shown in Figure 2. It has a structure in which multiple hidden layers are contained in the encoder and decoder, and its structure is symmetrical in relation to the latent layer. The latent layer is located between the encoder and the decoder, as in an autoencoder, and it stores the feature data that are acquired from the encoder.

#### **Figure 2.** SAE.

The SAE model is trained using the greedy layer-wise training methodology [29]. This methodology was proposed to determine the optimal parameters of an SAE, and it has been proven to be effective in learning an SAE with multiple hidden layers. This methodology can also reduce the network size of the SAE model and increase the training speed. Furthermore, it has the advantage of potentially reducing the risk of overfitting.

#### *3.3. Dataset*

Real ship data are not open-access data. Thus, we collected data from a real ship. The data in this study were those of a 13,000 TEU container ship used in actual operations. It is propelled by one MAN-Burmeister and Wain diesel engine and has four 3480 kW generators. Table 1 lists the specifications of the target ship.

**Table 1.** Specifications of a container ship.


The power load of the ship continuously fluctuates according to the operating state of the ship [30]. Furthermore, the steering equipment installed on the ship is powered by an electric motor, and the power load consumed by the electric motor is affected by the external resistance of the ship's hull [31]. Therefore, this study collected data on the electric load, which is the total electric load of the ship, as well as data that indicate the external resistance of the ship, such as its heading, rudder angle, water depth, water speed, wind angle, wind speed, and ship speed. Table 2 shows the types of data measured in this study.

**Table 2.** Various types of data measured.


The data were measured in 10 min intervals, and a total of 30,340 data values were collected. Figure 3 shows the collected ship power load data. Changes in the power load occurred as the ship was operated.

**Figure 3.** Ship electric load.

Table 3 shows the number of data points according to the ship's state. The ship is most often operated At Sea, and the least amount of time is spent in Stand By. In addition, by the number of data points measured, there are twice as many instances of the In Port state compared to the Stand By state.

**Table 3.** Number of data collected for each state of the ship.


#### **4. Approach**

This section describes the design approach for the SAE that classifies ship operating modes. In particular, we hoped to determine whether the structure of the SAE model and the size of its latent layer affected the operating mode classification performance.

#### *4.1. Overview*

An overview of the process of this research is given below.


#### *4.2. Model Design*

The SAE described in Section 2 was used in the operating mode classification model. To improve the performance of the SAE model, this study considered two aspects: the structure of the model and the size of the latent layer. Here, the model structure refers to the number of hidden layers within the encoder and decoder and the number of neurons used in each hidden layer. The size of the latent layer refers to the number of neurons in the latent layer. Figure 4 shows the basic structure of the model. The encoder and decoder were arranged in a symmetric form, and a softmax layer [32] was added to the output part of the decoder for classification.

**Figure 4.** SAE model structure.

The softmax layer uses a softmax activation function, which is employed in a deep learning-based model, to produce a classification model that classifies data into three or more classes [33]. If the input vector is *z*, the softmax activation function can be expressed as follows:

$$
\operatorname{softmax}(z) = \frac{\exp\left(a\_i\right)}{\sum\_{j=i}^k \exp\left(a\_j\right)}
$$

Here, *k* is the number of classes that should be output by the multi-class classifier, and exp(*ai*) is the standard exponential function of the *i*th input vector. Lastly, exp *aj* is the standard exponential function of the *j*th output vector.

#### *4.3. Evaluation Metrics*

This study selected accuracy, TPR, FPR, and MCC as the evaluation metrics to compare the performance of the models. The accuracy is the ratio of the number of data points that the multi-class classification model correctly classifies to the overall number of data points. TPR is the level at which the actual correct answer is clearly predicted. FPR is used to evaluate multi-class classification performance. Lastly, MCC has the advantage of being able to express the confusion matrix of a multi-class classification model in a balanced way; it is also a balanced evaluation metric that is good for representing the performance of such models [34,35]. MCC was considered to be the most important of all the evaluation metrics because this study presents research on a model that performs multi-class classification. The evaluation metrics can be expressed as shown in the following equations:

$$\text{True Positive Rate (TPR)} = \frac{TP}{TP + FN}$$

$$\text{False Positive Rate (FPR)} = \frac{FP}{FP + TN}$$

$$Accuracy = \frac{TP + TN}{TP + FP + TN + FN}$$

$$\text{MCC} = \frac{c \times s - \sum\_{k}^{K} p\_k \times t\_k}{\sqrt{\left(s^2 - \sum\_{k}^{K} p\_k^2\right) \times \left(s^2 - \sum\_{k}^{K} t\_k^2\right)}}$$

Here, *TP* is the number of true positives, *TN* is the number of true negatives, *FP* is the number of false positives, and *FN* is the number of false negatives. *c* is the number of samples correctly predicted out of the number of samples, and *s* is the number of samples. *K* is the total number of classes, and *k* is an individual class (a class from 1 to K). *pk* is the number of times class *k* was predicted, and *tk* is the number of times class *k* truly occurred.

#### *4.4. Composition of Models for Comparison Experiments*

We performed experiments on SAE models that had various structures and latent layer sizes to find the best SAE model. The hidden layers of the model were all fully connected, and a rectified linear unit (ReLU) activation function was used. A total of five model structures were employed in the comparison experiments. We used (depth, size) to express the model structures and depict the composition of the models. Here, depth refers to the number of hidden layers in the encoder and decoder, including the latent layer. Size refers to the number of neurons in the first hidden layer of the encoder. The encoder had a structure in which the number of hidden layer neurons decreased by half compared to that in the previous layer. Additionally, the decoder structure was symmetrical with the encoder structure. Table 4 shows the structures of the models used in the comparison experiments.

**Table 4.** Structures of the models for comparison.


Here, to find the exact size of the latent layer, we performed experiments that changed the latent layer values to 3, 6, 9, and 12. In addition, the collected dataset was divided into training, testing, and validation datasets to prevent the overfitting problem when training the classification model. Further, 60% of the entire dataset was used as the training dataset, and 20% was used as the testing dataset. The last 20% of the dataset was utilized as the validation dataset. Furthermore, an Adam optimizer [36] was employed as the optimization function in the training of all models, and a value of 1 × <sup>10</sup>−<sup>4</sup> was used as the learning rate. The evaluation experiments for comparison were performed with the Python, Scikit-Learn, and TensorFlow libraries.

#### **5. Experimental Results and Discussion**

#### *5.1. Experimental Results*

This section assesses the performance of the models by comparing the model structures used in the comparison experiments. For this analysis, 200 epochs of training were performed on all models, and the batch size was 64. Additionally, the evaluation metrics were used to evaluate the performance of each model structure, and the latent layer size suitable for each model structure was found. Comparative evaluations were also performed on the models that showed the best performance for each model structure.

Table 5 lists the evaluation results for the (5–32) structure. The accuracy performance is stable regardless of the latent layer size. However, the FPR and MCC values are affected by the latent layer size. A latent layer size of 9 produces the best performance in the (5–32) structure.


**Table 5.** Evaluation results of the (5–32) structure.

Table 6 shows the evaluation results for the (5–64) structure. Different from the evaluation results for the (5–32) structure, there is a significant difference in accuracy. The performance rapidly worsens when the latent layer size is 6. However, the FPR's performance is exceptional. The performance comparison results obtained using MCC indicate that the best performance occurs when the latent layer size is 12.

**Table 6.** Evaluation results of the (5–64) structure.


Table 7 illustrates the evaluation results for the (5–128) structure. The TPR performance is best when the latent layer size is 12, and the FPR performance is best when the latent layer size is 6. However, the MCC and accuracy performance are best when the latent layer size is 9.


**Table 7.** Evaluation results of the (5–128) structure.

Table 8 lists the evaluation results for the (7–64) structure. The TPR performance is the best when the latent layer size is 3, and the FPR performance is the best when the latent layer size is 9. However, the MCC and accuracy performance are at their best when the latent layer size is 12.

**Table 8.** Evaluation results of the (7–64) structure.


Table 9 shows the evaluation results for the (7–128) structure. The TPR performance is the best when the latent layer size is 9, and the FPR, MCC, and accuracy performance are the best when the latent layer size is 6.



Table 10 illustrates the best model structures based on MCC scores. In the comparison results, the MCC evaluation metric confirms that the (5–128) structure is the best at classifying the ship operating mode when the latent layer size is 9. Furthermore, Figure 5 shows the MCC value according to the latent layer size of the models used in the comparison experiments.

**Table 10.** Best models from structures based on MCC score.


**Figure 5.** MCC results of structures with latent layer sizes.

Tables 11–15 exhibit the confusion matrices of the best model structures based on MCC scores. The rows of the tables present the classification results, and the columns show the actual classification classes. The diagonal cells of the matrices show the numbers of successful classification results, and the off-diagonal cells show the numbers of misclassified results.

Table 11 presents the confusion matrix of the (5–32) structure with a latent layer size of 9. It can be seen that the classification performance for the Stand By state is the best among the confusion matrix entries compared.


**Table 11.** Confusion matrix of (5–32) structure with a latent layer size of 9.

Table 12 shows the confusion matrix of the (5–64) structure with a latent layer size of 12. The classification performance for the At Sea state is the best among the confusion matrix entries compared.

**Table 12.** Confusion matrix of (5–64) structure with a latent layer size of 12.


Table 13 provides the confusion matrix of the (5–128) structure with a latent layer size of 9. The classification performance for the In Port and Stand By states is relatively high among the confusion matrix entries compared, whereas the classification performance for the At Sea state is the lowest. However, it can also be seen that the classification model performance for the At Sea state is not especially low, with only 20 misclassified data points.

**Table 13.** Confusion matrix of (5–128) structure with a latent layer size of 9.


Table 14 shows the confusion matrix of the (7–64) structure with a latent layer size of 12. The classification performance for the In Port state is the best among the confusion matrix entries compared. Conversely, the performance for the Stand By state is the lowest.

**Table 14.** Confusion matrix of (7–64) structure with a latent layer size of 12.


Table 15 presents the confusion matrix of the (7–128) structure with a latent layer size of 6. This model has the lowest average performance out of the confusion matrix entries compared.


#### *5.2. Discussion*

Performance by Model Structure and Latent Size

This section discusses the importance of the findings of this study. First, we focused on model structure, which was found to affect the classification performance: the (5–128) structure gave the best results. This less complex model structure (5–128) had a better performance result than more complex model structures (7–64, 7–128).

Second, we investigated the effect of latent size on model performance. We found that the performance of the model increased with latent size up to a certain point and declined after that point. Hence, finding the appropriate latent size can improve performance even when a model's structure is fixed.

Through this study, several limitations of the variational autoencoder (VAE) model were identified. First, the proposed VAE model utilizes only container vessel data. These data may differ from those obtained from other types of ships, such as LNG ships and bulk ships. This pilot study focused on performing comparative experiments using confusion matrices and evaluation metrics to evaluate the ship state classification performance according to changes in the parameters of VAE models using actual ship data. Therefore, the direct discussion that is possible based on the current results is limited. In order to improve

the performance of the ship state classification model in the future, a comparative study with deep learning-based classification models using various structures that are currently used in other fields is needed.

Second, we found that the classification performance for the Stand By state was low. The model is affected by the balance of data, and the quantity of data for the Stand By state was very small in this study. This class imbalance problem can be solved in three ways: data-level, algorithm-level, and hybrid methods [37]. Data-level methods apply various data sampling methods that try to create balanced, distributed data for the training dataset. The cost-sensitive approaches focus on diminishing the bias toward major groups [38]. Hybrid methods combine the benefits of the previous two types of methods and minimize their weaknesses to improve classification model performance [39]. By adopting these methods, the problem of imbalanced data can be solved.

In the commercial realm, no algorithm has been presented that can automatically classify current ship states. However, as research on smart ships, reducing ship fuel consumption, and eco-friendly algorithms is conducted, the VAE model proposed in this pilot study can provide ship state-applied ship data to smart ship researchers. In addition, research can be conducted on reducing ship fuel consumption and improving eco-friendliness using the characteristics of the ship's state.

#### **6. Conclusions**

Artificial intelligence research using ship data is being actively conducted as the shipping market evolves. However, studies have not been performed on classifying ship operating modes, despite previous studies on using ship data to predict power loads. An SAE has the advantage of being able to perform effective classification by analyzing the features of the input data. Therefore, we conducted a pilot study on deep learning models that can classify ship operating modes using an SAE. Furthermore, experiments were performed to compare the performance according to changes in the structure of the SAE and changes in the size of its latent layer. The key points to be verified through research are as follows:


Through this pilot study, we found that our VAE-based deep learning model can be used to analyze ship operating modes. Furthermore, a model that can be used as a comparison target group for the classification of the actual operating state of the ship in the future was found. Based on this model structure, it will be possible to develop enhanced models. Although the classification performance for the Stand By mode was limited because of the imbalanced data, it was possible to propose a VAE model structure that can maximize the data classification performance of the In Port and At Sea modes. However, further research is required to address some limitations of this study. First, the issue of handling imbalanced data needs to be studied using real ship data; data-level techniques, algorithm-level methods, and hybrid methodologies can be utilized to find the most appropriate method to improve the classification model. Second, research to compare denoising, sparse, and stacking autoencoder models could be carried out to improve classification performance and establish which autoencoder-based model best interprets the features of real ship data.

**Author Contributions:** Conceptualization, methodology, and software, J.-Y.K.; project administration, funding acquisition, J.-S.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Korea Institute of Marine Science and Technology Promotion (KIMST), funded by the Korea Coast Guard (20190460).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
