1. Introduction
The maritime industry pillars the world trade as it transports around 90% of goods in volume and 70% in value [
1]. Apart from cargo transfer, the cruise ship industry has exhibited substantial growth the last decade [
2]. To support the functions and improve the sustainability of modern cruise industry, highly sophisticated cruise ships have been designed and built, which employ advanced propulsion and power plant systems, compartment arrangement and exterior design. It is widely acknowledged that the modern cruise ships are the most technologically sophisticated ships compared to other ship types.
The cruise ship industry is a highly competitive market that has been rapidly developing with both the vessels’ size and number constantly increasing [
2]. As cruise ships carry large numbers of passengers and crew, it is paramount to ensure the safety considering humans, assets, environment and business. Potential power system malfunctions, such as blackout may lead to collision, contact or grounding, which, in turn, may end up in significant human losses as well as severe environmental pollution [
3,
4]. This may also severely damage the financial and social profile of the cruise ship operator, and respectively of the whole cruise industry. The recent total blackout incident on-board of the Viking Sky [
5], where all the generator sets in both engine rooms shut down, provides a representative example of the potential safety, financial and social implications associated with blackout events. In this respect, it is important to minimise the blackout probability of the cruise ships’ power plants design as well as to ascertain that adequate power will be available when required [
6].
Another critical parameter is the significant cognitive load imposed by the ship systems on the cruise ships’ crews and operators with respect to prevention of accidents and incidences [
7]. The cruise ships power and propulsion systems are categorized as complex cyber-physical systems [
8] consisting of a significant number of heterogeneous components, interacting with each other in multiple ways. Such complexity leads to significant number of alarms that the crew must deal with constantly [
9], whilst it may hinder the classification of the critical alarms. This type of cognitive operator overload has been identified as one of the contributory factors to the Three Mile Island nuclear reactor accident [
8,
10]. In the recent blackout case on a cruise ship, the crew accepted and cleared low lubrication oil alarms (the reason for this were not reported), which in combination with heavy roll and pitch led to loss of three Diesel Generators (DGs) out of the four of this ship power plant [
11].
One potential solution to overcome the problem of cognitive load is to combine sensors and alarms with system safety models, rather than to use them independently from each other in specifically dedicated devices. The role of such automated safety monitoring devices “is to detect conditions that signal potentially hazardous disturbances and assist the operators of the system in the timely control of those disturbances” [
12]. The idea of using sensor measurements for safety enhancement during operations was introduced during the 1980s decade. For example, Ref. [
12] used those elements in condition monitoring systems.
Numerous studies focused on integrating safety models with sensor measurements on other systems. Hu et al. [
13] used a Bayesian network model for integrating condition monitoring and inspection data for risk assessment of a nuclear power plant system allowing for a more effective system health estimation compared with approaches based on the traditional Event Trees. Jinqiu et al. [
14] used Hazard and Operability study results to develop a dynamic Bayesian network for a gas turbine compressor system integrating sensor measurements. In a follow-up study [
15], the previously developed dynamic Bayesian network was applied to the gas turbine compressor for the purposes of risk assessment. Aizpurua et al. [
16,
17] integrated the prognostics estimations for power distribution system with Boolean Driven Markov Processes and Stochastic Activity networks. Gomes et al. [
18] combined Fault Trees with prognostics based remaining useful life estimation for an aircraft system. Pattison et al. [
19] have integrated the dynamic Bayesian networks with random forests and memetic algorithms for the development of real-time maintenance system for windfarms.
Nonetheless, very few studies focused on the ship propulsion systems. Abaei et al. [
20] have employed multinominal process trees and hierarchical Bayesian inference for predicting failures in machinery systems in the context of unmanned vessels. Eriksen et al. [
21] applied a modified version of Failure Modes and Effects Analysis for more effective maintenance of unmanned ship systems. However, both studies did not consider the utilisation of the sensor measurements for dynamic risk analysis.
This study aims at the development and demonstration of a potential blackout monitoring system for a cruise ship power plant system integrated with sensor measurements. The novelty of this study stems from: (a) an application and an advancement of the concept of integrating sensor measurements with safety methods to monitor and prevent blackout event in cruise ships power plants; (b) development of a methodology for concept demonstration in a virtual simulation environment, and; (c) verification of the concept in this virtual environment. Whilst this study employs material from previous publications on cruise ship power plants, it improves the current state-of-the-art knowledge by demonstrating how the existing safety methods can be fused with sensors measurements and integrated with reliability data to develop a safety monitoring system that predicts the selected safety metrics time variations.
The remaining of this study is structured as follows.
Section 2 elaborates the proposed methodology for concept development and verification. In
Section 3, the information about the investigated system is provided.
Section 4 presents and discusses the derived simulation results as well as provides recommendations for further developing the proposed safety system.
Section 5 summarises the min findings and the conclusions of this study.
2. Materials and Methods
2.1. Methodology Overview
The general overview of the followed methodology for the development and validation of the proposed blackout monitoring system is provided in
Figure 1. The first step includes the development of a safety model suitable for the blackout monitoring for the investigated cruise ship power plant system. In the second step, the parameters that can be monitored using sensors from the investigated cruise ship monitoring system are identified. During the third step, the failure rates of the investigated system components are estimated based on sensor measurements. In the fourth step, the methodology for fusion of selected system parameters/sensor measurements with the developed safety models and existing reliability data is presented. In the fifth step, criteria and metrics for the dynamic analysis of the observed situations are provided. In the sixth step, the system is simulated in a virtual environment and the concept is validated.
2.2. Step 1—Development of Safety Model
The first step of the methodology includes the development of a suitable safety model representing the system operation. In general, various safety analysis methods can be employed for this purpose including Fault Tree Analysis, Hazard and Operability studies, Bayesian Network [
22], Boolean-Driven Markovian Processes [
23]. Other methods/software tools could be also used, such as Hip-HOPs [
24], COMPASS [
25] or MADe [
26,
27,
28] for automatically deriving the Fault Tree or Dynamic Fault Trees. This study employs the Combinatorial Approach to Safety Analysis (CASA) method, which is presented in [
29,
30,
31]. The CASA method advantage is that it captures more accurately the dynamic and software-intensive character of cyber-physical systems compared to the classical Fault Tree Analysis [
29]. On the other side, CASA results in a very extensive depiction of the system top-event and is labour-intensive. The other safety methods have several limitations. The Hazard and Operability studies do not relate the various independent hazardous events together, whilst there is no clear guidance for the development of Bayesian Networks. Methods such as Boolean-Driven Markovian Processes, Hip-HOPs, COMPASS, or MADe do not capture properly the software-intensive character of cyber-physical systems. Therefore, the use of fault trees developed by employing the CASA method is considered as advantageous for the safety analysis in diesel-electric power plants (DEP) system as reported in [
30].
2.3. Step 2—Selection of the Monitored Parameters and Reliability Data
The following criteria are employed for selecting measured parameters for their integration with/inclusion to the developed automated safety monitoring system:
Measured parameters that sufficiently and effectively depict/represent the actual system health based on the pertinent literature.
Measured parameters that represent the system configuration and power demand, e.g., operating DG set(s).
Measured parameters monitored by the existing ship alarm and monitoring system.
Measured parameters from the ship plant critical components, as identified from previous safety analyses or accident investigation data.
In addition to the required measured parameters, a number of failure rates is also required based on their availability and the relevant databases. These failure rates are used in conjunction with the sensor measurements to estimate the components failure rate. The databases, such as OREDA [
32], are selected based on their relevance to the system, availability, their trustworthiness, and publication date. The proposed blackout monitoring system also incorporates the maintenance inspection intervals and the actual inspection implemented for the components.
2.4. Step 3—Estimation of Failure Rates Using Sensors Measurements
For the components, the safety metrics of which are monitored using sensor measurements, Health Indexes (
) are estimated to depict the performance and health status of the
ith component [
33,
34,
35,
36]. The
is estimated according to the following equation:
where
represents a feature of for the system
ith component,
is the feature value when the component fails (this can be the
value of an activated component failure alarm), and
denotes the feature value under normal conditions (without faults). Features are variables indicative of the components health status [
37]. The considered features (
can be the component temperature and/or the pressure, a parameter estimated based on vibration analysis, or a combination of physical parameters, which can be considered as a reliable representation of component health status.
can be estimated based on the physical parameters monitored for a system component in real-time or at periodic times. In real applications, the calculation of HI must be updated each time a significant difference exists between the initial feature value and the observed average value calculated over a specific time window. This time window depends on each component failure mode related to specific
. Preferably,
is a physical parameter monitored by the existing alarm and monitoring system.
According to Equation (1), the ith component is fully functional when =1; it fails when = 0; whilst intermediate values of HIi indicate degrading performance conditions of the ith component. In case exceeds the alarm limit , it is also considered that the component is faulty and in case is far away from both and not on the side of , the component is considered as healthy. The is defined the normal value of the that is observed during operation. Usually, this information can be retrieved from operation and maintenance manuals. Under real conditions, the can vary depending on the desired output from a system, e.g., the engine load will affect the normal exhaust gas temperature; however, it is considered as static herein.
Based on the estimation of the components
, in absence of any other information, it is assumed that the components failure rates (retrieved from used database) (
) can be updated, according to the following equation:
The index
depicts the failure rate estimated based on sensors measurements. The working assumption behind Equation (2) is that the closer the feature is to the alarm or failure threshold, the higher is the probability that failure will occur in the next time period. It is also expected that the relationship between time and
is exponential, as lower values of the component health index correspond to much lower component remaining useful life [
38,
39]. This can be viewed as rather a conservative approximation for the component fault growth trend. By using the
as the exponent in Equation (2), smoothness and exponential relationship in transition between normal and failure condition is ensured. In addition, the boundary conditions are satisfied, as elaborated in the next sentences. When
equals to 1, the
equals to the initial component failure rate
, which is the only available information at the beginning about the failure rate. The
ith component is considered faulty when
reaches rm limit is reached (
which provides a failure rate according to sensor measurements equal to 1 h
−1.
2.5. Step 4—Integration of Sensor Measurements Estimation and Database Data
For integrating the component failure rate with the health status estimated from the measured data, the following equation is used to calculate the actual failure rate of the
ith component, employing the weight
) assigned by the user or expert to different information sources as proposed in [
36]:
The logic behind Equation (3) is that the expert/user can have different trust levels in the information available from the various measurements (sensors) and historical databases. Full reliance on the measured parameters is denoted with w = 1, whereas w = 0 denotes not reliance on the measured parameters. The parameter w in this study is common to all the components, but it can be also become component specific; hence, it is denoted as wi.
2.6. Step 5—Dynamic Analysis
The warning levels are determined using reference
, where
denotes the probability of the top event. In this study, the reference
(
) is the geometrical mean for the orange level. The other warning levels are considered to have probability one or two levels higher or lower than the reference level. The use of logarithmic scale is employed, as the relationships in Equations (2) and (3) are exponential. This is also in line with maritime regulations, which recommend the separation of intolerable, tolerable and negligible risk based on logarithmic scale [
40]. In this respect, the warning levels described in
Table 1 were developed. The reference
can be set based on statistics or using expert opinion.
Based on
Table 1, the system levels, requiring intervention can be defined. The intervention is proposed to take place in the red or orange levels. The identification of safety enhancement actions during operations is supported by employing appropriate importance metrics. The importance metrics can be used to identify the most important potential failures and, therefore, to prioritise the rectification actions. The use of Birnbaum
and Fussel−Vesely
importance metrics is employed herein, as these are extensively employed for importance analyses, and they are associated with a clear physical meaning. High values of the Birnbaum importance metric indicate components that significantly affect the Fault Tree top event probability. Therefore, these components degradation must be carefully monitored.
can be also used to assess the top event sensitivity to some operating parameters such as number of operating DG sets or DG sets load. The Fussel−Vesely importance metric can be used to identify the components whose failure most probably will occur and will lead to the blackout. Higher value of
for a component, compared to other components, means that the top event will more likely occur from this component failure than from others.
The Birnbaum importance measure is estimated according to the following equation:
where
is probability of the basic event calculated by using
.
This study also adopted an alternative version of the Birnbaum importance metric that accounts for the plant operating parameters, such as engine load or number of connected DG sets. This metric is used to identify if a reconfiguration of the power plant is required to reduce the top event probability. Such reconfiguration may include the starting up of an additional DG set. This metric is calculated according to the following equation:
The following measures are considered small changes in the operating parameters values (): reducing the number of operating DG by 1 unit, slightly increasing the DG set load, reducing the number of connected electrical power consumers.
An averaged over time
metric is used to estimate the importance of each basic event in the Fault Tree based on the
values at different time steps and is estimated as follows:
where
denotes an importance estimation number for the identified components, whereas
denotes the maximum number of implemented importance estimations.
in this way is an averaged value of
and depicts the averaged criticality of a component.
The Fussel−Vesely importance measure is estimated by using the following equation:
An averaged over time
metric is used to estimate the importance of each basic event based on the
values at different time steps; this is estimated by the following equation:
The use of has similar purpose with .
The importance measures indicate which components’ failures/operating parameters must be monitored and controlled. Based on that, recommendations for the system safety enhancement can be provided. Examples of such recommendations include the switching over to a healthier DG set (allowing for performing maintenance and repair actions to the degraded DG set), increasing the number of operating DG sets, or reducing the propulsion motors load (speed).
2.7. Step 6—Simulation in Virtual Environment
In this step, the relevant adjustments are implemented to the safety model developed in step 1 to allow for the dynamic estimation of
in a virtual environment. These adjustments are not implemented to the safety model structure, but to the basic nodes of the safety model and are delineated in
Table 2. Some of the developed Fault Tree basic events from step 1 are transformed into the suitable Markovian process. This makes the developed Fault Tree similar to the Boolean-logic Driven Markovian Process (BDMP) [
23]. The use of Markovian process has been considered as necessary to depict some dynamic features of the investigated system, which are not available in the Fault Tree. The required input for this step of the analysis includes the plant operational data, such as components in operation, components maintenance intervals and testing intervals (
), maintenance rates (
), components failure rates (
), beta factor of the Weibull distribution (
), and the probability of failure on demand for the software components (
).
denotes the predicted time horizon at each time
.
The use of these equations constitutes an improvement to the CASA calculations, as they allow for estimating the selected safety metrics time variations, compared to the static predictions of the previously presented approaches [
29,
30].
For the simulation purposes only, it was assumed that the
is calculated according to the following equation:
where
is the normal feature value, whilst
is degradation parameter,
is the time of the last maintenance for the component
. A noise term is introduced in the analysis to account for the sensor’s measurement uncertainty. However, the actual fault growth curve can differ significantly from the proposed curve and depends on the component operating conditions. These assumptions are used only for the simulation purposes.
The was assumed to be equivalent to the half of the inverse of the maintenance inspection interval. This assumption was made based on the observation that the preventative maintenance scheme in maintenance manuals quite often is implemented every half of the component useful life.
4. Results
4.1. Step 1—The Developed Safety Model
The model has been developed for blackout monitoring system simulation in Matlab/Simulink environment by modelling all the relevant components, such as the major DEP subsystems: DG sets, Propulsion Motors (PM), Engine Room (ER) components, Bow Thrusters (BTs), Switchboards (SW). The basic event probability for each component as well as the components operating status are used in the Fault Tree calculations. The employed Fault Tree structure is not presented, as it is too extensive; detailed information about the developed Fault Tree model can be found in [
30]. This Fault Tree calculates the safety metrics in a static manner. However, considering the implementation of the adjustments described in step 6, the developed model is capable of calculating the time variations of the selected safety metrics. This developed FT model interface in Simulink is provided in
Figure 3.
4.2. Step 2—Selection of the Monitored Parameters
The features
that were selected for the health monitoring of several components of the investigated system along with these components maintenance intervals (used to estimate
) are provided in
Table 4. These selected parameters are available in the existing ship alarm and monitoring system, and are typically employed for monitoring the safety critical components of the investigated power plant, as reported in [
30].
Other parameters that are used as input to the FT model are: DG sets operating status, DG sets load, engine room operating status (whether in use or not; the ship has two engine rooms), number of operating DG sets in each engine room, hotel electric power demand, propulsion motors operating status and load, bow thrusters’ status and load, whether a DG set is starting, whether a propulsion motor is starting.
4.3. Steps 3−6—Simulation Results
Figure 4a illustrates the simulation results for the time variations of the power plant blackout probability and the probability of the sudden loss of one DG set for the three investigated case studies along with the numbers of the power plant main components (DG sets, azipods, bow thrusters) operating and the electric power demand time variation which are used as input. Therefore, the first two upper subplots of
Figure 4a depict the output, whilst the two lower subplots depict the input. The inclusion of input and output is used to facilitate the identification of correlations between the input and output. Furthermore, to facilitate the results analysis, the same results are presented in
Figure 4b, excluding the time periods in which one DG set operates.
As it can be observed from
Figure 4a, the PoB is in red warning level for the time periods cases when only one DG set operates. At these time periods, the propulsion motors (azipods) do not operate, which indicates that the vessel is in the harbour mode. Therefore, it is deduced that the PoB significantly increases in harbour mode, which is aligned with findings of our previous study [
30]. In this respect, the proposed system is developed to provide alarms for the cases where the warning zone is reached and the cruise ship operates in any mode.
The PoB is also exceeded for short time periods when the propulsion motors (azipods) do not operate and the ship operates in its manoeuvring mode using its bow thrusters; the electric power demand is relatively low, whilst three DG sets operate. This can be attributed to the fact that power reduction functions for the bow thrusters are not available, and hence, this safety barrier cannot be considered in this ship operating mode. It can be inferred that the power reduction functions have a critical role for reducing the PoB, and hence, in ensuring the ship safety, which is in alignment with findings from our previous study [
30]. Thus, a potential way to improve the ship safety would be by including the power reduction functions for the bow thrusters, provided that the cruise manoeuvring operation is not jeopardised.
The probability of sudden loss of a DG set (PoDGloss) in the system is also presented in
Figure 4. It is observed that the PoDGloss time variation follows the same pattern with the time variation of the connected (operating) DG sets number. This can be attributed to the fact that the more DG sets are connected to the system, the higher the probability that one of them will fail. However, the PoB seems to only slightly correlate to the PoDGloss, which indicates that other parameters are more critical for the PoB than the connected DG sets number, except for the time periods when only one DG set is connected. This indicate that the impact of this incidence (DG failure) is lower. These parameters are identified in the following paragraphs. The PoDGloss are generally in yellow region, which indicate that from this perspective, no intervention is required by the crew.
The three performed case studies employed different values of the weights (0, 0.5 and 1) to exclude, partly use, or only use the measured parameters for the calculation of the PoB time variations. It is deduced from the results of
Figure 4 that differentiation of the PoB time variations are derived for these three case studies. This indicates that the incorporation of sensor measurements by the proposed blackout monitoring system influences the PoB results. This comparison is used to demonstrate the impact from incorporating the sensors’ measurements additionally on the components failure rates estimations. More importantly, the dynamic measurement of parameters (through sensors), such the number of connected components, their loading conditions, and others significantly influence the risk metrics estimation.
The derived importance measures for the selected components are provided in
Table 5 and
Figure 5 and
Figure 6. These figures illustrate the importance measures calculated every 24 h, and consequently, the variation of these importance metrics in intermediate time steps is not provided. However, the provision of lines facilitates the results visualisation. The importance metrics in
Table 5 are in alignment with the results reported in [
30], and demonstrate that not only physical failures but also failures in software functions are categorised as critical for system safety. Additionally, the derived results demonstrate the importance of other hazardous events, e.g., arc in switchboards. Based on the preceding considerations, the system operator must pay attention to measured parameters variation and the degradation of the identified critical components and failures, as small changes in these parameters can have significant influence on the power plant PoB.
It is deduced from
Figure 5 that the importance metric for each DG set varies with time. It must be noted that for the cases where a DG set is not connected to the ship electric grid (does not operate), its importance metric equals to 0. The importance metrics variation depends on the usage of the DG sets and operating loads, the system configuration and the components degradation. DG2, DG5 and DG6 exhibit a relatively low value of the importance metric
at
t = 96 h, as three DG sets operate at relatively low load; thus, it is unlikely that a failure in one of these operating DG sets will result in blackout. Therefore, it can be inferred that it would be possible to operate at this time period with only two DG sets at a higher load, which is expected to improve the energy efficiency of the investigated power plant, with only slightly affecting the plant safety (in terms of the PoB). The considerable values of the importance metric
for DG1, DG5 and DG6 at
t = 144 h demonstrated a higher sensitivity of these operating DG sets to failures. This indicates situations where the crew must be alerted for potential failures or degradation of the operating DG sets’ health status, as explained in the next paragraph.
Figure 6 provides the importance metrics
for different components of DG1. These importance metrics time variations are related to the identified critical components operational characteristics, failure rates and degradation patterns. The most critical components for the DG1 operations were found to be the pumps for the lubricating oil as well as the cooling water of high and low temperature. These components require attention from the crew for DG1.
4.4. Discussion
From the results analysis of the preceding section, it has been demonstrated that the investigated power plant parameters, such as, the number of the operating DG sets connected to the ship electric grid, the power plant components’ health status, the DGs’ operating conditions (load), affect the system safety. To address the power plant safety issues during operation requires to account for all these parameters, which complicates the decision-making process for assuring the system safety. However, with an assistance from proper tools for analysing pertinent information, these parameters effects can be quantified and used to estimate the safety metrics for the investigated power plant, allowing for the identification and treatment of potentially hazardous conditions more effectively. With the development of autonomous systems in the maritime industry and the replacing of human operations by smart systems, such monitoring tools are essential for ensuring the systems safety. In the context of fully autonomous ships, the presence of advanced and intelligent safety monitoring systems is prerequisite.
The proposed concept for the development of the presented blackout monitoring system satisfies several criteria for automating the power plant safety monitoring and management [
12]. It provides high-level functional alarms for the investigated cruise ship power plant based on the prevailing system operating conditions. It also allows for the classification of the alarms/failures to reflect their importance on the investigated power plant safety. Furthermore, it allows the operator to assess indirectly the impact of different failures on the ship functions and to select the components that need to be maintained or disconnected from the ship electric grid. Therefore, it can be inferred that this concept facilitates the effective safety management for the power plants of cruise ships.
Nonetheless, the proposed monitoring system has some limitations. A future study could focus on diagnostics development and use of the shipboard measurements to develop diagnostics toolboxes. Prognostics algorithms could be also incorporated in the blackout monitoring system. Future study could also investigate the impact of different maintenance and inspection intervals on the power plant safety. Methods for diagnosing the sensor failures and measurements uncertainty could also further enhance the propose monitoring system functionality. However, the presented concept constitutes the first essential step to the development of a fully automated blackout monitoring system.