Application of Machine Learning in the Identification and Prediction of Maritime Accident Factors

Maceiras, Candela; Cao-Feijóo, Genaro; Pérez-Canosa, José M.; Orosa, José A.

doi:10.3390/app14167239

Open AccessFeature PaperArticle

Application of Machine Learning in the Identification and Prediction of Maritime Accident Factors

Department of Navigation Science and Marine Engineering, University of A Coruña, Paseo de Ronda, 51, 15011 A Coruña, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7239; https://doi.org/10.3390/app14167239 (registering DOI)

Submission received: 20 July 2024 / Revised: 12 August 2024 / Accepted: 15 August 2024 / Published: 17 August 2024

(This article belongs to the Section Marine Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Artificial intelligence seems to be a new point of view to classical problems that, in the past, could not be understood in depth, leaving certain gaps in each knowledge area. As an example of this, maritime accidents are one of the most recognised international problems, with clear environmental and human life consequences. From the beginning, statistical studies have shown that not only the typical sampled variables must be considered but the accidents are related to human factors that, at the same time, are related to some variables like fatigue that cannot be easily sampled. In this research work, the use of machine learning algorithms on over 300 maritime accidents is proposed to identify the relationship between human factors and the main variables. The results showed that compliance with the minimum crew members and ship length are the two most relevant variables related to each accident for the Spanish Search and Rescue (SAR) region, as well as the characteristics of the ships. These accidents could be understood as three main groups of accidents related to the general tendency to not meet the minimum number of crew members and its difference in the year of construction of the ship. Finally, it was possible to use neural networks to model accidents with sufficient accuracy (determination factor higher than 0.60), which is particularly interesting in the context of a control system for maritime transport.

Keywords:

maritime accidents; neural networks; clustering; random forest; human factor

1. Introduction

Maritime accidents are traditionally analysed based on historical technical studies that allow us to identify risk situations. To obtain this identification of risks, a new and more accurate mathematical model will be of interest to improve the prevention of the occurrence of accidents by considering the accident theory [1]. Nowadays, based on these models, and as a consequence of the consideration of technical and human factor points of view, there is a decreased number of accidents [2], although, as per some authors [3], due to the increase in the size of ships, one single incident can have catastrophic and long-term consequences. What is more, the main intention of the International Association of Marine Aids to Navigation and Lighthouse Authorities IALA standards is to give recommendations [4], derived from probability studies [5,6], that will be available to different countries to control the risk in maritime transport. Despite this, these models are limited to certain working conditions and need to improve the understanding of the human factor, which claims to be the cause of 75 to 96% of maritime accidents [7,8]. For that reason, authors such as Galierikova [9] studied the incorporation of the human factor into an accident investigation program, using a Human Factor Analysis and Classification System (HFACS) in order to provide the categorisation of the causal factors in one of nineteen sections, named the coding process. In consequence, there is not a single model that lets us predict all the risks based on the commonly considered variables being developed in the last decades of several research works for collisions [10]. Regarding the influence of the human factor on vessel collisions, Vinagre-Ríos et al. [11] carried out research about the influence of circadian alterations caused by the action of sunlight on bridge watchkeepers. Another type of research work was centred on certain navigation areas [12], highlighting the need for models when there are incomplete accident databases. In this sense, more recent works have shown statistical studies for the Strait of Istanbul [13].

Based on the previous results, it is interesting to highlight that the human factor is being investigated as the most relevant variable at the time to prevent accidents in fishing vessels [14,15,16,17,18]. Despite this, it is a complex point of view that is difficult to understand from the variables typically considered by CIAIM reports [19]. These reports are from the Spanish National Official Commission to investigate maritime accidents in the Spanish Search and Rescue (SAR [17]) region. All the reports are on its website and the information from 300 accidents were manually organised for a posterior mathematical analysis. Another group, Hasanspahic et al. [20], followed this methodology of investigating marine accidents in the database of official governments—in that case, the UK Marine Accident Investigation Branch (MAIB). In that work, the HFACS method was used, and multiple linear regression was obtained in order to determine the relationship between the number of accidents and the most common causal factors.

Although it is widely extended, even as a myth, according to some authors [21], that more than 80% of maritime accidents are related to the human factor, a few studies have proposed novel models using actual AI tools, pointing out the specific weight and influence of all the factors in the shipping industry.

In this sense, a new approach was used in previous works in order to define an initial qualitative causal model, and the Bayesian Belief Network (BBN) model was used for the Spanish Search and Rescue (SAR [22]) region. This technique was previously used in another aviation sector for modelling accident investigation reports for aviation safety assessment [23]. The results showed that commonly used variables like visibility and sea conditions were the worst (least probable factors) to define accidents for this region, and statistical models were obtained.

In this research work, artificial intelligence algorithms are proposed to investigate and model these accidents with the aim of increasing the accuracy and the quantification of the variables related to each accident.

2. Materials and Methods

In the past, inductive statistics mathematical studies let us develop curve fitting between the variables of a process, and, in recent years, artificial intelligence procedures let us increase the accuracy. These AI procedures let us obtain not only a classical description of data but new information and graphical representations. Despite this, the new data and charts must be interpreted by researchers to gain an adequate understanding of the process. In consequence, initially, classical statistical studies will be the first to be completed to identify the relation between variables. Despite this, some machine learning algorithms like random forest [24] will let us improve the understanding of the previous studies and quantify the relevance of each variable in a process. At the same time, clustering analysis is another interesting tool that lets us identify groups of data that, sometimes, can be associated with certain values of a classificatory variable. Finally, several machine learning modelling processes will be employed and tested with respect to classical curve-fitting procedures, with the aim of obtaining a useful model to predict future accidents and possible preventive actions to be taken. All these methods will be described and employed in the next sections.

2.1. Database and Codification

The information obtained from the CIAM [19] database shows the more relevant accidents and their classically identified variables commonly considered, such as wind force and the ship dimensions, among others. Despite this, there is a lack of mathematical studies of this complex process that relates objective and subjective parameters commonly associated with the human factor.

The information was codified, as it is shown in (Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7 and Table A8). This codification was employed to develop a quantitative, and not qualitative, analysis. What is more, several histograms let us understand if the information complied with normality and parametric statistical studies.

Once the codification was accomplished, a detailed study was developed to identify the amount of data that can be employed for statistical studies. In this sense, it was considered if there was data enough for all the considered variables in each accident. As a result, 276 accidents were considered. It was true that, when the accident was between two ships, there was enough information about only one of them. Based on this information, the most common values and their maximum and minimum ranges were identified for these accidents.

2.2. ANOVA

As it was explained before, it is of interest to understand the type of data. Previous studies [25] showed that the obtained data were in accordance with a normal distribution and homoscedasticity, so parametric analysis can be employed. The first study will be based on identifying the relation between variables. In this sense, a classical one-way ANOVA between the 24 variables and the type of accidents was done by the software SPSS Statistics version 28. This software was selected since it was the one selected to develop a wide variety of statistical studies in previous works.

2.3. Clustering

To run several artificial intelligence algorithms, not only a workstation with adequate RAM memory and cores, in consequence, a HP Workstation Z820 Intel Xeon 24 core 2.7 GHz 128 GB RAM laptop (Palo Alto, CA, USA) was employed in this case study. At the same time, the software employed for curve fitting was Matlab 2024 in its most recent version due to its capability to interconnect the sampling of data (like the Internet of Things) with the artificial intelligence toolboxes (AI). Similar open-source software resources like Python can be employed to develop this work, but the possibility to create a posterior control system with Matlab Simulink is another advantage that guided us to select Matlab for this analysis.

Once the software resource and the workstation were selected, different AI analyses were done. The first one was clustering, which can be defined as one method of grouping several similar objects into one cluster and different objects into another [26]. In particular, clustering is a machine learning tool that lets us identify groups of data that have a minimum Euclidian distance to a centrode. In consequence, these groups of data, in a posterior analysis, could be associated with some of the analysed variables, highlighting their relevance.

2.4. Neural Networks

Several curve-fitting processes were employed in previous works, like, for instance, minimum square error curve fitting and, in a posterior multivariable curve fitting, response surface modelling. Despite this, these models were obtained with a decreased determination factor that will not let us obtain a model for posterior predictive applications in maritime accidents. In recent years, artificial intelligence procedures have shown a great increase in the accuracy of their proposed models. In particular, neural networks show clear advantages. In this sense, a neural network can be defined as a complex system comprising nodes and links represented by neurons and their connections [27]. The use of this network can be employed for curve fitting.

In this study, feedforward neural networks with 10 layers were employed, and 70% of the data was employed in the training and validation process and the 30% remaining data in the test analysis. These networks were employed to model ship accidents but with special care in the number of layers, always employing a minimum number, to prevent overfitting and, in consequence, a decreased predictive capacity of the model.

3. Results and Discussion

In this section, the previously described tools were employed to obtain new original results and derived conclusions. Initially, in Section 3.1, the analysis of the frequency and range of the most relevant variables is completed. What is more, posterior research about the relation between variables is also completed.

3.1. Database

As described before, 24 variables were obtained from each ship’s accident reports. Despite this, due to the lack of information from the reports, the number of variables was decreased to eleven to ensure that most of the 276 accidents have information about all the variables. In particular, the main variables analysed were the type of ship and its dimensions; type of accident; year of construction and weather conditions such as wind force, sea conditions visibility and nocturnality. Finally, it incorporated the information if the ship complies with (“Yes/No”) the minimum number of crew members. In consequence, an initial analysis of the most common values of each variable is identified as can be seen in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11.

From these figures, it can be concluded that the most common types of ships are 2 and 3, which are passenger and/or cargo vessels and fishing vessels for both of the ships implicated in the accident. The most common types of accidents are types 2 and 10, which are flooding and operational accidents. The construction year is near the year 2000, the wind force was shown as a gentle breeze when the accident happened and sea conditions were smooth. The swells were homogeneous in a number of cases and values, and in most of the cases, visibility was good. The moment of the accident (day or night) was homogenous for all of the 276 accidents analysed. Finally, ship length was the minimum for both ships, so it was associated with the fishing vessels. If both ships complied with the minimum number of crew members was analysed at the time (values 1 (one ship) and 5 (two ships)). In consequence, the most common accidents were related to fishing vessels, passenger ships and due to flooding or operational accidents in the year 2000 under good weather conditions (which reinforces the conclusion that the accidents were due to operational actions and, in consequence, human factor consequences).

3.2. Identification of Principal Variables

Despite the fact that eleven variables commonly employed at the time to describe ship accidents were selected, it is of particular interest to identify the most relevant variables associated with the type of accidents. Some of these studies used classical statistical methods like hypothesis tests (ANOVA), and others were based on artificial intelligence (AI) algorithms like PCA (principal component analysis) or random forest algorithms. Despite this, the PCA study only let us simplify the computation time but did not let us identify the most relevant variables, and random forest (200 decision trees) is a well-recognised procedure to be employed to improve the information obtained from the hypothesis test.

3.2.1. ANOVA Analysis

In this section, the hypothesis test of one-way ANOVA was employed to identify the relation between the variables and the type of accident that happened. In particular, a hypothesis test with a significance level of 0.05 and the obtained results are summed up in Table 1.

From Table 1, it was concluded that the type of accident is associated with the type of ship A, construction year, wind force, sea conditions and both ship’s dimensions. The same happened with the number of crew members in ship A and the TRB of ship A and if it obeyed the minimum crew members requirements. In a decreased level, the type of ship B was associated with the type of accident.

It is of interest to highlight that visibility, the crew members and the TRB of ship B were not associated with the type of accident.

3.2.2. Random Forest Variables

As it was commented on before, a random forest curve fitting with 200 decision trees was completed to show the relevance of each variable. In this sense, Figure 2 shows the levels and importance of each of the 11 variables to predict the type of accident. As a result, it was determined that the most relevant variable was whether the ship complied with the minimum number of crew members. It was 12 times more relevant than the other variables and was associated with operational accidents and not with technical problems or bad weather conditions. What is more, once again, this decreased number of crew members was associated with the human factor, which is a variable of special interest in preventing accidents on board. Other variables were associated with the type of accident but with a decreased intensity than the crew members. This was the case for the ship dimensions (length and beam) and, in a decreased manner, the year of construction, wind force and nighttime period. This result is in agreement with Bayesian networks developed in previous works and is more restrictive than the previous ANOVA study.

As it was something common at the time to define the relation between the variables in machine learning studies, it is of interest to present a confusion matrix, as shown in Figure 3.

From Figure 3, the level of correlation between them in accordance with a colour scale is defined, as shown in Figure 3. This figure shows in blue and green squares that there is a clear relation between ship length and beam as a consequence of common ship design criteria, validating the understanding of this confusion matrix. In a decreased manner, sea conditions and wind force are related in yellow and blue squares. Despite the fact that the previous random forest and ANOVA study showed a slight relation between ship length and year of construction, it cannot be observed clearly in this matrix as a consequence of the difficulty of transforming numbers in clearly differentiated colours.

3.3. Clustering

Once the main variables associated with a ship accident are identified, it is time to identify groups of accidents with similar behaviours. This is done nowadays with the identification of accidents that, represented in their eleven dimensions associated with the eleven considered variables, have a minimum Euclidian distance to a centroid. This is a cluster of accidents, and the reason why a cluster happens must be investigated. To employ this, it is commonly proposed to employ the Calinski–Harabasz index to identify an initial number of clusters more adequate for the proposed data. This index is shown in Figure 4.

It is of interest to analyse each cluster due to the need to understand the mathematical criteria with a physical process. In this sense, Figure 5 shows, for each cluster, the number of accidents that the cluster represents.

3.3.1. Clusters 1 and 2

As we can see, cluster 1 is the one with the highest number of accidents associated. It represents the cluster number versus the two main variables like complying with the minimum crew members and ship dimensions (beam) or ship type or year of construction (Figure 6, Figure 7 and Figure 8). From Figure 6, it can be concluded that cluster 2 is related to the accidents that do not meet the minimum crew members (2) associated with accidents from 4 to 10. In consequence, it can be concluded that tugboats, yachts and recreational vessels are too often undermanned. What is more, Figure 7 shows that clusters 1 and 2 are associated with capsizing accidents (6) of ship types tugboats and auxiliary boats of the port service, commercial yachts and recreational boats. What is more, it is associated with the lowest beam dimensions (Figure 8) and the recent year of construction (Figure 9).

3.3.2. Clusters 6 to 8

As a consequence, clusters 6 to 8 are associated with accident type 1 (contacts), where ships do not meet the minimum crew member requirements (3 and 4). At the same time, they are related to accident 10 (Operational) and 1, 2, 3 and 4 at a lower level for ship types 5 and 6 (commercial yachts and recreational boats), which are associated with a decreased beam dimension in Figure 8. Finally, Figure 9 shows that these clusters are related to the oldest ships where contact or flooding happens.

These clusters are the most common and show a general tendency to not meet the minimum crew members, and the difference is in the year of construction. Finally, the clusters related to meeting the minimum number of crew members are the remaining surfaces in the three figures. In general terms, it can be concluded that each cluster is associated with each type of accident and the number of crew members.

3.4. Neural Networks

Once the behaviour of these accidents is understood, it is the time to define a model that may let us predict the probability of an accident and to propose a control system that lets us, by altering some of the main variables, prevent the accident. Different statistical curve-fitting processes were developed in the past with a decreased determination factor. Based on the machine learning tools, it was proposed to train a neural network that may predict the type of accident. The software employed for curve fitting was Matlab 2024 in its most recent version. In consequence, a feedforward neural network was trained, validated and tested with the sampled data (three predictors (length, beam and minimum crew) to define the type of accident. In particular, a 10-layer neural network was employed, and the training algorithm was the stochastic gradient descent (sgd) with a learning rate drop factor of 0.1. As a result, the error of this network is shown in Figure 10. These clusters are the most common and show a general tendency to not meet the minimum crew members, and its difference is in the year of construction. Finally, the clusters related to meeting the minimum number of crew members are the remaining surfaces in the three figures. In general terms, it can be concluded that each cluster is associated with each type of accident and the number of crew members.

From Figure 10, it can be concluded that most cases are within an error of 2 points at the time to predict the type of accident. To understand the behaviour of this neural network, curve fitting of each training, validation and test process is represented in Figure 11 by randomly employing 70% of the 300 accidents in the training, 15% for validation and 15% of the input data for testing, respectively.

From Figure 11, it can be concluded that the neural network has adequate precision with a determination factor of 0.64 in the training process and 0.60 in general, which ensures a decreased overtraining of the network as an interesting predictive tool.

By increasing the number of relevant variables (year construction, wind force, nighttime, length, beam and minimum crew) to define the type of accident, an increase in the accuracy and the determination factor was obtained, as is shown in Figure 12 and Figure 13.

From Figure 12, it can be observed that errors are more concentrated and decreased with respect to the target values, with a clear increment in the accuracy of the neural network.

From Figure 13, it can be observed that the determination factor of the training process is higher than before (0.67) and increases the general determination factor (0.63) but maintains higher validation and test values, which ensure decreased overfeeding.

Once the neural network of the process was obtained with a high accuracy, it was proposed to obtain a three-dimensional understanding of the accidents and, in the posterior section, to develop an artificial intelligence that lets us know if we are near or far from an accident. To accomplish this, we obtain a graphical 3D representation of a constant value of wind force 3 (gentle breeze), construction year of 2000, accident during the day hours (1) and a beam of 20 m, in accordance with the highest frequency obtained previously in the figures. In consequence, a combination of ship lengths from 10 to 300 m and different conditions of minimum crew members (codes from 1 to 5) were simulated and represented in a 3D chart. In consequence, 1140 simulations were completed, and it was possible to develop 3D charts, as shown in Figure 14.

From this figure, it can be concluded that values of 1 and 5 of the minimum crew members are related to complying with the requirements and types of accidents like list and stranded and flooding and grounding, respectively. What is more, when the ship length is decreased, the type of accident tends to increase to a superior code, but it is independent of the ship length further away from codes 1 (meet), 2 (does not meet) and 3 (one of the two ships do not meet).

In this sense, based on the clustering analysis and neural network modelling procedure for a particular region, it is possible to define a series of recommendations to identify the probability of a nearby accident. For instance, examples of these recommendations can be associated with identifying the number of crew members on ships that ensure a decreased number of accidents.

Finally, it must be highlighted that there are some limitations in this study. In this sense, the data employed belong to the CIAIM database related to the SAR region, where the influence of weather conditions is blurred by the navigation requirements designed for this area by the Spanish Ministry itself. In consequence, mathematical models and neural network models can be employed for civil ships that navigate this region. The main intention of this work was to define a methodology that could be useful for the prediction of accidents in the SAR region and that could be extrapolated to other ones. Despite this, it must be investigated and tested in future research works.

Other limitations of this research were the number of data employed and the variables employed. The variables employed are indeed the typically employed variables in most maritime accident reports, but there are probably still some pending new variables to be incorporated that can let us obtain more accurate models about the human factor and its causal factors. In this sense, new variables like the experience and training of the crew members or maintenance of the ship could be of interest to obtain new, more accurate models.

Despite this, the main results obtained could be indirectly associated with the human factor, which, in clear accordance with previous research works, is associated with most maritime accidents.

4. Conclusions

From these results, several conclusions could be obtained:

The main variable related to maritime accidents in the SAR region is the enforcement of the minimum crew members and, at a lower level, the ship dimensions, which, indirectly, are associated with the ship type.
The results are centred in the SAR region, and this original methodology must be tested in another region where weather conditions and new variables like crew members’ experience can affect the selection of the more relevant variables.
Three main clusters can be observed in Figure 6, Figure 7, Figure 8 and Figure 9 based on their three main colours (cluster 1 (in white), cluster 2 (light green) and cluster 3 (dark green), which are associated with the meeting of the minimum crew member requirement and that are differentiated by the year of construction of the ship. At the same time, based on the 3D model, it can be concluded that a more adequate definition of the minimum number of crew members may help to decrease the number of accidents.
It is possible to obtain a mathematical model based on feedforward neural networks that let us predict the expected type of accident at any moment based on the year of construction, wind force, nighttime, length, beam and minimum crew with a determination factor over 67%, and new variables associated with the human factor could increase this result in future works.
An electronic control system based on this model can be of interest to predict and prevent future maritime accidents in this region.

Author Contributions

Conceptualization, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Methodology, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Software, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Validation, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Formal analysis, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Investigation, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Resources, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Data curation, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Writing—original draft, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Writing—review & editing, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Visualization, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Supervision, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Project administration, C.M., G.C.-F., J.M.P.-C. and J.A.O.; Funding acquisition, C.M., G.C.-F., J.M.P.-C. and J.A.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Ship type code.

Code	Ship Type
1	Extraction Platforms and Auxiliary Supply vessels
2	Passenger and/or cargo vessels
3	Fishing vessels
4	Fishing and Aquaculture Auxiliary Boats
5	Tugboats and auxiliary boats of the Port Service
6	Commercial Yachts
7	Recreational Boats
8	Vessels of Public Organizations (National or Autonomous Communities)
9	Provisional List of Vessels under Construction

Table A2. Type of accident code.

Code	Type of Accident
1	Contact
2	Flooding
3	Grounding
4	Collision
5	Fire
6	Capsizing
7	List
8	Stranded
9	Loss of Stability
10	Operational Accident

Table A3. Code of the minimum crew members requirement.

Code	Denomination
	One ship involved	Two ships involved
1	Meets	-
2	Do not meet	-
3	-	One does not meet
4	-	Two do not meet
5	-	Two meet

Table A4. Code of the visibility level 1.

Code	Visibility Level
1	Very Bad (less than 1000 m)
2	Bad (between 1000 m and 2 nautical miles)
3	Moderate (between 2 nautical miles and 5 nautical miles)
4	Good (more than 5 nautical miles)

Table A5. Code of the time period in which the accident occurs.

Code	Period of Time
1	Day
2	Night

Table A6. Code of the human factor influence.

Code	Human Factor Conditions
1	Human factor influences the accident
2	Human factor does not influence the accident
3	Doubtful

Table A7. Code of the wind velocity in accordance with the Beaufort scale.

Wind Force (Code)	Denomination	Velocity
Wind Force (Code)	Denomination	(km/h)	Knots
0	Calm	0–2	Till 1
1	Light air	2–6	1–3
2	Light breeze	7–11	4–6
3	Gentle breeze	12–19	7–10
4	Moderate breeze	20–29	11–16
5	Fresh breeze	30–39	17–21
6	Strong breeze	40–50	22–27
7	High wind	51–61	28–33
8	Gale	62–74	24–40
9	Strong gale	75–87	41–47
10	Storm	88–101	48–55
11	Violent storm	102–107	56–63
12	Hurricane force	>118	>64

Table A8. Code of the sea conditions in accordance with the Douglas scale.

Sea Conditions (Code)	Douglas Scale	Wave Height		Beaufort Scale
Sea Conditions (Code)	Douglas Scale	(m)	(ft)	Beaufort Scale
0	Calm (Glassy)	No wave		0
1	Calm (rippled)	0–0.10	0.00–0.33	1–2
2	Smooth	0.10–0.50	0.33–1.64	3
3	Slight	0.50–1.25	1.6–4.1	4
4	Moderate	1.25–2.50	4.1–8.2	5
5	Rough	2.50–4.00	8.2–13.1	6
6	Very rough	4.00–6.00	13.1–19.7	7
7	High	6.00–9.00	19.7–29.5	8–9
8	Very high	9.00–14.00	29.5–45.9	10–11
9	Phenomenal	14.00+	45.9+	12

Table A9. Example of some records of the 300 accidents.

Ship Type	Accident	Year	Wind Direction	Wind Force	Sea	Visibility	Day/ Night	Length	Beam	Gt. M	Crew	Minimum Crew	Human Factor Influence
3	6	1985	203	4	3	3	1	5	2	1	2	1	1
3	8	1967	338	3	2	3	1	5	2	1	1	1	0
3	6	1991	338	5	2	2	1	5	2	1	2	1	1
3	4	1994	180	3	2	4	2	5	2	1	1	2	0

References

Awal, Z.I.; Kazuhiko, H. A Study on Accident Theories and Application to Maritime Accidents. Procedia Eng. 2017, 194, 298–306. [Google Scholar] [CrossRef]
Allianz. Safety and Shipping 1912–2012 from Titanic to Costa Concordia. 2012. Available online: https://www.allianz.com/content/dam/onemarketing/azcom/Allianz_com/migration/media/press/document/other/agcs_safety_shipping_1912-2012.pdf (accessed on 2 August 2024).
Dominguez-Pery, C.; Vuddaraju, L.N.R.; Corbett-Etchevers, I.; Tassabehji, R. Reducing maritime accidents in ships by tackling human error: A bibliometric review and research agenda. J. Ship. Trade 2021, 6, 20. [Google Scholar] [CrossRef]
AISM—IALA. IALA Recommendation O-134. On the IALA Risk Management Tool for Ports and Restricted Waterways, 2nd ed.; AISM—IALA: Saint Germain in Laye, France, 2009; Available online: https://www.iala-aism.org/content/uploads/2016/07/o_134_ed2_iala_risk_management_tool_for_ports_and_restricted_waterways_may2009.pdf (accessed on 14 September 2020).
Friis-Hansen, P. IWRAP MKII Working Document: Basic Modelling Principles for Prediction of Collision and Grounding Frequencies. Technical University of Denmark. 2008. Available online: https://www.iala-aism.org/wiki/iwrap/images/2/2b/IWRAP_Theory.pdf (accessed on 20 July 2024).
Fujii, Y.; Yamanouchi, H.; Mizuki, N. Some Factors Affecting the Frequency of Accidents in Marine Traffic. II: The probability of Stranding, III: The Effect of Darkness on the Probability of Stranding. J. Navig. 1974, 27, 235–252. [Google Scholar] [CrossRef]
Hanzu-Pazara, R.; Barsan, E.; Arsenie, P.; Chiotoriou, L.; Raicu, G. Reducing of maritime accidents caused by human factors using simulators in training process. J. Marit. Res. 2008, 5, 3–18. Available online: https://www.jmr.unican.es/index.php/jmr/article/view/32/30 (accessed on 14 August 2024).
Youn, I.-H.; Park, D.-J.; Yim, J.-B. Analysis of lookout activity in a simulated environment to investigate maritime accidents caused by human error. Appl. Sci. 2019, 9, 4. [Google Scholar] [CrossRef]
Galierikova, A. The human factor and maritime safety. In Proceedings of the 13th International Scientific Conference on Sustainable, Modern and Safe Transport, Novy Smokovec, Slovak Republic, 29–31 May 2019. [Google Scholar] [CrossRef]
Rafiqul Islam, M.; Ibn Awal, Z.; Maimum, A. Development of a mathematical model for analysis on ship collision dynamics. In Proceedings of the International Conference on Mechanical Engineering, Dhaka, Bangladesh, 29–31 September 2007; Volume 3, pp. 819–834, ICME07-AM-47. Available online: https://zobair.buet.ac.bd/Publications/2008%20-%20Islam%20et%20al.pdf (accessed on 14 August 2024).
Vinagre-Ríos, J.; Pérez-Canosa, J.M.; Iglesias-Baniela, S. The effect of circadian rhythms on shipping accidents. J. Navig. 2021, 74, 1189–1199. [Google Scholar] [CrossRef]
Liu, C.P.; Ching-Wu, C.; Gin-Shuh, L.; Su, Y. Establishing marine accident classification: A case study in Taiwan. J. East Asia Soc. Transp. Stud. 2005, 6, 952–967. [Google Scholar] [CrossRef]
Ulusçu, O.S.; Özbaç, B.; Altiok, T.; Or, I. Risk analysis of the vessel traffic in the Strait of Istanbul. Risk Anal. 2009, 29, 1454–1472. [Google Scholar] [CrossRef] [PubMed]
Håvold, J.I. Safety culture aboard fishing vessels. Saf. Sci. 2010, 48, 1054–1061. [Google Scholar] [CrossRef]
Alvite-Castro, J.; Orosa, J.A.; Vergara, D.; Costa, A.M.; Bouzón, R. A new design criterion to improve the intact stability of Galician small fishing vessels. J. Mar. Sci. Eng. 2020, 8, 499. [Google Scholar] [CrossRef]
Ozguc, O. Structural damage of ship-FPSO collisions. J. Mar. Eng. Technol. 2019, 18, 1–35. [Google Scholar] [CrossRef]
Wang, J.; Pillay, A.; Kwon, Y.; Wall, A.; Loughran, C. An analysis of fishing vessel accidents. Accid. Anal. Prev. 2005, 37, 1019–1024. [Google Scholar] [CrossRef] [PubMed]
Laursen, L.H.; Hansen, H.L.; Jensen, O.C. Fatal occupational accidents in Danish fishing vessels 1989–2005. Int. J. Inj. Control Saf. Promot. 2008, 15, 109–117. [Google Scholar] [CrossRef] [PubMed]
Spanish Commission for Investigation of Maritime Accidents and Incidents, CIAIM. Available online: https://www.transportes.gob.es/organos-colegiados/ciaim (accessed on 20 July 2024).
Hasanspahic, N.; Vujicic, S.; Francic, V.; Campara, L. The Role of the Human Factor in Marine Accidents. J. Mar. Sci. Eng. 2021, 9, 261. [Google Scholar] [CrossRef]
Wróbel, K. Searching for the origins of the myth. 80% human error impact on maritime safety. Reliab. Eng. Syst. Saf. 2016, 216, 107942. [Google Scholar] [CrossRef]
Salvamento Maritimo Español. Available online: http://www.salvamentomaritimo.es (accessed on 20 July 2024).
Zhang, X.; Mahadevan, S. Bayesian network modelling of accident investigation reports for aviation safety assessment. Reliab. Eng. Syst. Saf. 2021, 209, 107371. [Google Scholar] [CrossRef]
Leo Breiman, L.; Adele Cutler, A. Classification and Regression Based on a Forest of Trees Using Random Inputs. (2015-10-07 08:38:34). Available online: https://www.stat.berkeley.edu/~breiman/RandomForests/ (accessed on 20 July 2024).
Maceiras, C.; Pérez-Canosa, J.M.; Vergara, D.; Orosa, J.A. A Detailed Identification of Classificatory Variables in Ship Accidents: A Spanish Case Study. J. Mar. Sci. Eng. 2021, 9, 192. [Google Scholar] [CrossRef]
Mohamed Nafuri, A.F.; Sani, N.S.; Zainudin, N.F.A.; Rahman, A.H.A.; Aliff, M. Clustering Analysis for Classifying Student Academic Performance in Higher Education. Appl. Sci. 2022, 12, 9467. [Google Scholar] [CrossRef]
Udpa, S.S.; Udpa, L. NDT Techniques: Signal and Image Processing. In Encyclopedia of Materials: Science and Technology, 2nd ed.; Elsevier: Amsterdam, The Netherlands, 2001; pp. 6033–6035. [Google Scholar] [CrossRef]

Figure 1. Frequency of each variable in 276 accidents.

Figure 2. Importance of each variable to deduce the type of accident.

Figure 3. Importance of each variable to deduce the type of accident.

Figure 4. Number of clusters of the proposed data.

Figure 5. Relation between clusters and the number of accidents.

Figure 6. Representation of the cluster number versus the type of accidents and compliance with the minimum number of crew members.

Figure 7. Representation of the cluster number versus the type of accidents and ship type.

Figure 8. Representation of the cluster number versus the type of accidents and beam dimensions.

Figure 9. Representation of the cluster number versus the type of accidents and year of construction.

Figure 10. Error histogram of the trained neural network.

Figure 11. Curve fitting of the training, validation and test processes.

Figure 12. Error histogram of the trained neural network.

Figure 13. Curve fitting of the training, validation and test processes.

Figure 14. Three-dimensional chart of the type of accident versus minimum crew and ship length.

Table 1. Significance of 24 input variables and the output (type of accident).

Ship type	0.029
Year of construction	0.531
Wind force	0.000
Sea conditions	0.000
Swell	0.000
Visibility	0.584
Night	0.003
Length	0.000
Beam	0.000
Cause of the accident	0.000
Number of crew members	0.732
Comply minimum crew members	0.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Maceiras, C.; Cao-Feijóo, G.; Pérez-Canosa, J.M.; Orosa, J.A. Application of Machine Learning in the Identification and Prediction of Maritime Accident Factors. Appl. Sci. 2024, 14, 7239. https://doi.org/10.3390/app14167239

AMA Style

Maceiras C, Cao-Feijóo G, Pérez-Canosa JM, Orosa JA. Application of Machine Learning in the Identification and Prediction of Maritime Accident Factors. Applied Sciences. 2024; 14(16):7239. https://doi.org/10.3390/app14167239

Chicago/Turabian Style

Maceiras, Candela, Genaro Cao-Feijóo, José M. Pérez-Canosa, and José A. Orosa. 2024. "Application of Machine Learning in the Identification and Prediction of Maritime Accident Factors" Applied Sciences 14, no. 16: 7239. https://doi.org/10.3390/app14167239

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Application of Machine Learning in the Identification and Prediction of Maritime Accident Factors

Abstract

1. Introduction

2. Materials and Methods

2.1. Database and Codification

2.2. ANOVA

2.3. Clustering

2.4. Neural Networks

3. Results and Discussion

3.1. Database

3.2. Identification of Principal Variables

3.2.1. ANOVA Analysis

3.2.2. Random Forest Variables

3.3. Clustering

3.3.1. Clusters 1 and 2

3.3.2. Clusters 6 to 8

3.4. Neural Networks

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI