Smoke Detection of Marine Engine Room Based on a Machine Vision Model (CWC-Yolov5s)

Zou, Yongjiu; Zhang, Jinqiu; Du, Taili; Jiang, Xingjia; Wang, Hao; Zhang, Peng; Zhang, Yuewen; Sun, Peiting

doi:10.3390/jmse11081564

Open AccessArticle

Smoke Detection of Marine Engine Room Based on a Machine Vision Model (CWC-Yolov5s)

by

Yongjiu Zou

^1,2

,

Jinqiu Zhang

¹,

Taili Du

^1,2

,

Xingjia Jiang

^1,2

,

Hao Wang

¹

,

Peng Zhang

^1,2,*

,

Yuewen Zhang

^1,2,* and

Peiting Sun

¹

Marine Engineering College, Dalian Maritime University, Dalian 116026, China

²

Collaborative Innovation Research Institute of Autonomous Ship, Dalian Maritime University, Dalian 116026, China

^*

Authors to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2023, 11(8), 1564; https://doi.org/10.3390/jmse11081564

Submission received: 13 July 2023 / Revised: 4 August 2023 / Accepted: 4 August 2023 / Published: 8 August 2023

Download

Browse Figures

Versions Notes

Abstract

:

According to statistics, about 70% of ship fire accidents occur in the engine room, due to the complex internal structure and various combustible materials. Once a fire occurs, it is difficult to extinguish and significantly impacts the crew’s life and property. Therefore, it is urgent to design a method to detect the fire phenomenon in the engine room in real time. To address this problem, a machine vision model (CWC-YOLOv5s) is proposed, which can identify early fires through smoke detection methods. Firstly, a coordinate attention mechanism is added to the backbone of the baseline model (YOLOv5s) to enhance the perception of image feature information. The loss function of the baseline model is optimized by wise intersection over union, which speeds up the convergence and improves the effect of model checking. Then, the coordconv coordinate convolution layer replaces the standard convolution layer of the baseline model, which enhances the boundary information and improves the model regression accuracy. Finally, the proposed machine vision model is verified by using the ship video system and the laboratory smoke simulation bench. The results show that the proposed model has a detection precision of 91.8% and a recall rate of 88.1%, which are 2.2% and 4.6% higher than those of the baseline model.

Keywords:

intelligent ship; machine vision; improved YOLOv5s model; smoke detection

1. Introduction

Ships sailing on the ocean for a long period of time have the characteristics of strong airtightness and narrow space [1]. Key areas such as engine rooms contain many circuit systems and inflammable and explosive dangerous goods, as well as a large amount of electronic equipment, and the structure of these areas is usually very complex. This equipment occupies a large space, and the storage of goods is also very tight [2]. It is easy to cause a series of fires through faults such as engine overheating and the short-circuiting of electronic equipment, and it is likely to involve other fields, causing serious economic losses to ships. When the ship is far from land, it is difficult to evacuate the people on board and transfer the goods on board in time when the fire occurs, which also causes great harm to personal safety. Additionally, there are many high-temperature components on the ship, which will ignite explosive materials in the cabin once it is soaked in water during sailing, greatly increasing the probability of fire [3]. Moreover, cracks in steel plates will inevitably occur during the navigation of ships, and sparks generated during maintenance welding will also cause fires [4]. In the early stage of a ship fire, the smoke appears earlier than the flame, and its diffusion is extremely strong, making the camera more likely to detect it. Therefore, it is essential to capture and detect the smoke in the early stages of a ship fire quickly and accurately; this can greatly reduce the harm and loss caused by ship fire accidents [5].

Early ship fire smoke identification was detected by sensors, and was mainly identified by monitoring changes in physical phenomena such as temperature and smoke particles [6,7,8]. However, this method only has a good effect in a specific environment. In other environments, such as outdoors or other complex environments, they can easily be interfered with by other factors, and cannot obtain real and effective images on the scene. Additionally, the detection efficiency and accuracy are greatly reduced. With the development of computer vision technology, various algorithms based on image and video technology are widely used in large fire smoke detection. In the early stages of the research, the traditional ship smoke detection algorithms based on image and video technology mainly identify and analyze the main features of the smoke, such as its shape, color and motion characteristics [9,10,11]. Toereyin et al. [12] used wavelet transform to detect the flame flicker period and the color transformation of the fire, and combined the change information of time and color to reduce the false detection of objects with similar colors. Chen Rong et al. [13] used the two-dimensional maximum entropy automatic threshold method to segment and process the fire image, and then extracted the suspicious area to identify the flame. Shi Guangming et al. [14] initially identified the smoke in the image through the correlation between pixels and principal component analysis, and then separated the suspected region by the model obtained from the pure smoke image to extract the smoke image in the image. These researchers often use various classification algorithms to identify fire smoke by studying one or more features of smoke. However, these algorithms only rely on shallow feature extraction, which has low accuracy, is easy to misdirect, and has weak generalization ability for new scenes and emergencies. In recent years, with the rapid development of deep learning image detection technology, deep learning has exceeded the traditional manual feature extraction methods in many fields, and it can extract more abstract and deeper features in the process of ship smoke detection, to make the model a better generalization [15,16,17]. Wu et al. [18] adopted the mainstream target detection models Faster R-CNN, SSD and YOLOv1 for flame detection, and found that SSD had the best detection efficiency and the highest accuracy. On this basis, they proposed tiny-YOLO, a lightweight network applied to mobile devices. Li et al. [19] compared the differences between Faster R-CNN, R-FCN, YOLOv3, SSD and computer vision methods in flame detection, and found that the accuracy of target detection algorithms is better than that of artificial feature extraction algorithms. Among them, the YOLOv3 algorithm has the highest detection accuracy and the fastest speed. Huo et al. [20] added a convolutional path to the YOLOv4 backbone feature extraction network to broaden the backbone network and enhance the feature extraction capability of the network. A spatial pyramid pooling (SPP) module was added to the feature fusion layer to enhance the detection capability of the network for small targets. Xie et al. [21] added the channel attention module to the network prediction header of the YOLOv4 network to improve the detection accuracy of smoke. Abdusalomov et al. [22,23,24] adopted a solution integrating the Internet of Things (IoT) and YOLOv5 to solve the problem of insufficient research on real-time monitoring of outdoor forest fires. Although many scholars have adopted various current mainstream target detection algorithm models for smoke detection, there are still many problems such as insufficient generalization ability, the poor anti-interference ability of models, slow detection speed and the need to improve accuracy. Therefore, more in-depth research and improvement are needed.

At present, there are relatively few research methods using machine vision technology in the existing ship smoke detection algorithms, and the existing data sets are not large while the existing algorithms still have problems such as low detection speed, low detection accuracy, serious false detection and missing detection, insufficient generalization ability of the model, and poor anti-interference ability of the model. To solve these problems, this paper proposed a highly flexible improved one-stage YOLOv5s model, and compared it with a mainstream algorithm, proving that the improved model is of great significance for image-based real-time ship smoke and fire warnings. It can make an accurate and rapid detection in the early stages of ship fire, to protect the lives and property of people on board. The improvement process is as follows:

(1): First, the coordination attention (CA) mechanism is integrated into the backbone part of the YOLOv5s network, which strengthens the feature extraction capability and improves the accuracy of smoke detection without adding network parameters.
(2): Wise intersection over union (WIoU) is then used to replace the complete intersection over union (CIoU) loss function, which accelerates the model convergence speed and improves the regression accuracy.
(3): Finally, a coordinate convolution layer is added to the neck part of the YOLOv5s network structure, which strengthens the process of feature extraction and fusion, thus improving the speed of smoke detection.

2. Materials and Methods

2.1. Experiment Dataset

An accurate deep learning network model requires a large amount of training data to support it. The quality of the data set greatly affects the generalization ability and detection results of the model. However, the existing public data sets are rarely able to be specifically targeted at the interior of the ship’s engine room; it is therefore necessary to collect and mark the data to create a ship smoke data set.

Many smoke images recorded by the camera in the engine room of the ship were searched and downloaded on the website of each platform, and smoke images like the internal environment of the engine room were collected through shooting. After a sorting-out process, 5449 images were obtained. As shown in Table 1, set 1 is the collected smoke images in the ship’s engine room, and set 2 is the smoke images in various indoor scenes. The labeling tool LabelImg was used to label the location and corresponding growth stage of the smoke in the self-made data set. The data set is rich and diverse, including smoke scenes in various cabin environments, so that the model can fully learn the characteristics of smoke and can be well generalized to various scenes in the cabin, thereby improving the robustness of the model. The original data set is randomly divided into a training set, validation set and test set according to the ratio of 8:1:1.

Figure 1 shows the relative position and size distribution of smoke targets on the smoke dataset. The darker the color is, the denser the distribution of the region is, and the visualization results of the sample distribution and location in the smoke data set are specifically displayed. Figure 1a shows the sample center points of the whole image; each box represents the occurrence of a sample, and the darker the color is the greater is the frequency of occurrence. The depth of the color reflects the number of occurrences. The results show that the overall position distribution of samples in the image is relatively uniform. Figure 1b shows the ratio of width and height of the sample in the whole image. It can be observed from the results in the figure that the whole sample is concentrated on the ray with an oblique angle of 45 degrees [25]. In general, the distribution and composition of the whole data set are relatively uniform.

2.2. Experimental Environment

The configuration of the experimental platform in this paper is shown in Table 2:

During model training, the learning rate is set to 0.1, the weight attenuation of the optimizer is set to 0.0005, and the momentum of stochastic gradient descent (SGD) is set to 0.937. After the training has been completed, the camera can be turned on to start real-time visual monitoring of the ship’s engine room to identify the smoke status.

The entire smoke experiment scene of the ship’s engine room is an example of an oil rag catching fire in the working room of the ship’s engine room. The whole experiment process is shown in four pictures, from the initial normal state, to the beginning of smoke, the increase of smoke, and finally the emergence of fire. The dynamic picture of the overall scene is shown in Figure 2:

2.3. Evaluation Index

In this paper, precision, recall, inference time and mean average precision (mAP) are used to evaluate network performance. IoU is the ratio of the intersection and union of the various categories predicted by the model. True positive (TP) represents the number of positive samples with correct classification, true negative (TN) represents the number of negative samples with correct classification, false positive (FP) represents the number of negative samples with wrong classification, and false negative (FN) represents the number of positive samples with wrong classification. Precision can reflect the detection performance of the model network, and it can predict the correct proportion of all the predicted targets, also known as the accuracy rate, which can be calculated by Equation (1). The recall rate refers to the proportion of all marked objects that the model can correctly predict, also known as the recall rate, which can be calculated from Equation (2). The inference time is the time it takes to run the network model to process an image. AP is the integral of the precision-recall curve, which can be calculated by Equation (3). mAP is used to measure the recognition accuracy and is the average of all AP classes, which can be calculated by Equation (4) where m is the number of categories. It has two threshold metrics, [email protected] with a threshold of 0.5 and [email protected]:0.95 with a threshold of 0.5 to 0.95. In addition, the model loss is proposed to estimate the error between the model prediction and the true value. The loss function is mainly composed of three indicators: objective confidence loss (obj loss), classification loss (clsloss) and localization loss (boxloss).

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

A P = \int_{0}^{1} P r e c i s i o n - R e c a l l (R e c a l l) d (R e c a l l)

(3)

m A P = \frac{1}{m} \sum_{i = 1}^{m} A P_{i}

(4)

2.4. Framework

The design process of ship’s engine room smoke detection is divided into three parts, including the construction of the smoke data set, the establishment of the improved YOLOv5s model and the verification and application of the proposed model. The construction of the smoke data set involves smoke scene data collection. LabelImg software is used to label the data, and data enhancement is used to construct the data set. The establishment of the improved YOLOv5s model includes model optimization, parameter setting, model training, and obtaining the optimal weight. The verification of the model includes the input of the video image, the loading of the improved model, and the output of the test results. The flow chart of the whole model is shown in Figure 3:

The overall framework of this paper is as follows:

(1): Section 2 mainly introduces the collection of experimental data, the establishment and processing of data sets, as well as the experimental environment and evaluation indicators required in this study.
(2): Section 3 mainly shows the improvement of the YOLOv5s model by adding a coordinate attention mechanism, WIoU loss function and coordinate convolution layer to the YOLOv5s model (CWC-YOLOv5s).
(3): Section 4 mainly analyzes and compares the experimental results, and compares them with the mainstream methods.
(4): Section 5 provides a discussion and conclusions, including directions for future improvement and research prospects.

3. Proposed CWC-YOLOv5s Model

3.1. YOLOv5s Network Model

At present, the smoke detection and identification methods based on computer vision in marine engine rooms generally have some problems, such as insufficient accuracy, slow detection speed, serious loss of extracted features, insufficient model generalization and high cost. The smoke detection model also has the disadvantage of a large difference in effect on different data sets. With the continuous development of machine vision, current smoke detection methods based on deep learning can be roughly divided into two categories: one is the multi-stage target detection model based on regional suggestion represented by R-CNN [26]. The other is a regression-based single-stage object detection model represented by YOLO. Among the existing target detection algorithms, YOLOv5 has a wide range of applications due to its fast detection speed, strong adaptability, and high accuracy. The YOLO model has developed from YOLOv1 to the current YOLOv5, and its detection speed and accuracy have been greatly improved, even approaching the multi-stage target detection method [27]. Therefore, most scholars use the YOLO model for smoke detection research. The YOLOv5 target detection model, as the current mainstream single-stage target detection model, has the characteristics of high performance and light weight, and meets the requirements of smoke detection [28]. Therefore, it is feasible to select YOLOv5 target detection model for smoke detection in this paper. However, to realize the actual demand of smoke detection in a ship’s engine room, it is necessary to improve the network structure of YOLOv5. Experimental tests are then performed on different data sets to finalize the model network.

This article mainly focuses on the improvement of the YOLOv5 6.0 version. YOLOv5 provides five dimensional models according to different widths and depths: n, s, m, l and xl [29]. The YOLOv5s network with better precision and faster speed is selected as the baseline to improve the model network. YOLOv5s is mainly composed of input, backbone, neck and prediction head [30]. As shown in Figure 4, a K-means clustering method is used to generate a new candidate frame size for the dataset at the input end [31], in which the backbone network includes a C3-darknet (C3) module, a spatial pyramid pooling-fast (SPPF) module, and a convolutional layer. The neck is mainly composed of the adding module, the C3 module and the upsampling module. Finally, the prediction is made through the prediction head and the prediction result is the output. The C3 module is a simplified version of the network layer of a cross-stage partial network (CSPNet). CSPNet itself can solve problems such as the repetition of gradient information in the backbone network, and can then integrate gradient changes into the feature mapping. The evolution of the C3 module can make the model more lightweight, and the SPPF module is added to the end of the backbone network. Compared with the spatial pyramid pooling (SPP) module, it changes the convolution kernel size of 5:9:13 parallel maximum pooling to 5:5:5 serial maximum pooling, which can accelerate the detection speed. Finally, the CIoU loss function is used as the bounding box loss function and non-maximum suppression (NMS) is used as the post-processing algorithm to achieve the effect of multi-detection box weight removal [32].

3.2. Improved YOLOv5s Model

3.2.1. Adding Coordination Attention Mechanism

The attention mechanism has been successful in many aspects of computer vision. Through the attention mechanism, the dynamic weight adjustment function of the input image features can be realized. In this paper, the CA mechanism is added to the backbone feature extraction network to obtain more remote dependency correlation. At the same time, the spatial structure information of the target can be combined to encode the obtained feature map into a direction-aware and position-sensitive attention map, which is supplemented with the input feature map to help the model better identify. Hou et al. [33] proposed a coordinate attention module in CVPR2021, which mainly improves the squeeze excitation (SE) attention. Without considering the position information, it only focuses on modeling the channel relationship and measuring the importance of each channel. The advantage of the CA module is that it does not only capture cross-channel information, but also captures direction perception and location information, thus avoiding the shortcomings of SE attention. The channel information of the feature map can be encoded in horizontal and vertical spatial directions. It can not only obtain the long-term dependence on the spatial direction, but also preserve location information and expand the global acceptance field of the network. The structure diagram of the CA module is shown in Figure 5, where X Avg Pool and Y Avg Pool are pooled along the X and Y axes to extract the feature information in the width and height direction. The Concat addition operation is to aggregate the feature information on the X axis and Y axis, and then the Conv convolution can obtain the remote dependency relationship; normalization is then carried out. A rectified linear unit (ReLU) activation function is used to obtain the global information of each dimension. Split slice operations are then carried out along the width and height, and the convolution and ReLU activation are carried out, respectively. Finally, the re-weight is carried out and a spatial dimension-based attention mechanism is obtained.

The core idea of the CA mechanism is to embed the position information of the candidate box into the channel attention, so as to avoid the position-information loss caused by 2D global pooling to transform the feature tensor into a single-feature vector [34]. Two parallel one-dimensional feature codes are adopted, and two one-dimensional global pooling is used to aggregate the sensing features along the x position in the vertical h and horizontal w directions, respectively, so that the network structure can obtain more receptive fields, representing the output of channel c at the height h. The calculation formula is as follows:

Z_{c}^{h} (h) = \frac{1}{W} \sum_{0 \leq i < W} x_{c} (h, i)

(5)

Z_{c}

represents the output of channel c,

Z_{c}^{w}

represents channel c output in width w; the calculation formula is [35]:

Z_{c}^{w} (w) = \frac{1}{H} \sum_{0 \leq j < H} x_{c} (j, w)

(6)

3.2.2. Replacement Loss Function

This paper uses WIoU to replace CIoU loss function. Focal efficient IoU (EIoU) v1 was proposed to solve the boundary box regression equilibrium problem between samples of good quality and poor quality, but the potential of non-monotonic FM was not fully utilized due to its static focusing mechanism (FM). Aiming at this problem, a dynamic non-monotonic frequency modulation loss algorithm based on IO is proposed, which is called WIoU. Compared with the state-of-the-art SIoU, the proposed WIoU achieves a lower regression error in simulation experiments.

WIoU loss definition:

L_{W I o U v 1} = R_{W I o U} L_{I o U}

(7)

R_{W I o U} = \exp (\frac{{(x - x_{g t})}^{2} + {(y - y_{g t})}^{2}}{(W_{g}^{2} + H_{g}^{2}) *})

(8)

where Wg and Hg are the width and height of the minimum enclosure. To prevent gradients that hinder convergence, Wg and Hg are separated from the computed graph (superscript * indicates this operation). Thus, it effectively removes the obstacle of convergence, which will greatly expand the volume of the ordinary anchor box; the WIoU loss function significantly reduces the high-quality anchor box and focuses on the distance between the center point when the anchor box overlaps with the target box. Therefore, the convergence rate of the model can be greatly improved by replacing the loss function.

3.2.3. Adding a Coordinate Convolution Layer

Coordconv is used to replace the 1 × 1 convolution layer in the feature pyramid network (FPN) [36] and the first convolution layer in the detection header. As shown in Figure 6, the coordconv layer can be seen as an extension of the original convolution layer. Firstly, coordinate information from the feature map is extracted to fill the additional channel, and then stitched with the original feature map. Finally, the standard convolution is used. In general, channels are prepared for the i and j coordinates, and both i and j have corresponding linear transformations that normalize them to the range [−1,1], which are sufficient to record the spatial information in the input feature map. If necessary, another channel, the r-coordinate, can be introduced to handle the specific situation. The r coordinate can be calculated as follows:

r = \sqrt{{(i - h / 2)}^{2} + {(j - w / 2)}^{2}}

(9)

where i and j represent the basic coordinates and h and w are the size of the feature map [37].

The coordconv layer stores horizontal and vertical pixel information for object boundaries with additional channels as the input. Therefore, the network can output information-rich feature graphs after continuous convolution operations. This coordconv module can effectively highlight detailed information, reduce feature loss, and facilitate pixel-level segmentation, which is very different from the standard convolution layer. Figure 6 shows the characteristics of the coordconv layer and the convolution layer, from which the coordconv layer can strengthen the boundary information and reduce various types of internal variability [38].

Through the above three steps of structure improvement, the overall network structure of CWC-YOLOv5s is shown in Figure 7. The network is mainly divided into the backbone network, neck and prediction head, in which the coordination attention mechanism is added after the 8th layer of the backbone network, the convolutional layer is modified into the coordinate convolutional layer at the 10th, 14th, 18th, and 21st layers of the neck, and the coordinate convolutional layer is added after the 17th, 20th, and 23rd layers, respectively, and the loss function is replaced by the WIoU loss function.

3.3. Validation of the CWC-YOLOv5s Model on a Public Data Set

The improved model was verified on the public data set visdrone2019 [39] and the results were obtained, as shown in Figure 8.

Figure 8 shows that the CWC-YOLOv5s model has a rising trend during 50 iterations, and then begins to converge after 30 iterations. The overall trend area is stable. Compared with the YOLOv5s model on the public data set visdrone2019 [email protected], the overall improvement is obvious. It is proven that the CWC-YOLOv5s model can carry out targeted research and discussion on the smoke data set.

4. Case Study

4.1. Training Results

The smoke dataset was used to train the original YOLOv5s model and CWC-YOLOv5s for 300 epochs, respectively, and the box loss, object loss, precision, recall rate and the mAP on the training set and the verification set were obtained. In the initial training stage of the two models, the learning efficiency of the model is high, and the convergence speed of the training curve is also fast. As the number of training increases, the slope of the training curve gradually decreases and tends to be stable. The box loss and object loss are reduced and gradually converged. The precision and recall rate continue to increase and tend to be consistent, as shown in Figure 9 and Figure 10. After about 50 epochs, the model has reached a platform period in terms of precision and recall rate. The box loss and object loss dropped sharply in the first 50 instances of network training, and then stabilized at about 80 instances. The experimental results show that the CWC-YOLOv5s model has a good convergence effect. Therefore, the best weight obtained after 300 trainings is selected as the weight of smoke detection.

The smoke detection model based on CWC-YOLOv5s was trained by using a smoke data set, and the optimal weight of the model was obtained. Through the simulation test of the oil cloth in the engine room of the ship, it was found that the camera can capture the smoke in the first 22 s of the simulation of the smoke experiment. Compared with the experiment in the literature [40], the model proposed in this paper can identify smoke within 31 s of the test, indicating that the model detection speed is faster. In order to explore the improvement effect, the data of box loss, object loss, accuracy, recall rate and average accuracy were compared between the training set and the test set. It can be seen from the above training results that both training loss and validation loss decrease rapidly in the first 50 epochs, indicating that the model converges faster. The numbers of trainings start to converge after 50 and the overall trend becomes stable. The comparison results are shown in Figure 11 and Figure 12. It can be seen from Figure 11 that the overall trend of box loss and object loss in the training set and test set of CWC-YOLOv5s is lower than that of the YOLOv5s model, indicating that the model has a good convergence state. As can be seen from Figure 12, the general trends of [email protected], accuracy rate, recall rate and [email protected]:0.95 of CWC-YOLOv5s model are higher than those of the YOLOv5s model, indicating that the overall performance of the CWC-YOLOv5s model is better than that of the original YOLOv5s model.

4.2. Analysis of Experimental Results

4.2.1. Performance Comparison

In addition to the above experimental comparison, we also compared the improved algorithm with the current more advanced object detection algorithms. Specifically, the CWC-YOLOv5s model was compared with SSD, YOLOv3, YOLOv5m, YOLOv7, YOLOv8 and the initial YOLOv5s model, and the default parameters were adopted to verify the reliability and accuracy of the proposed improved network. Considering the requirements of the actual scenario, we chose precision, the recall rate, inference time, and [email protected] as the four metrics. These models are trained and validated on the same data set. The experimental results are shown in Table 3.

As can be seen from Table 3, the YOLOv5s model has a greater detection accuracy, recall rate and average accuracy than SSD, YOLOv3, YOLOv5m and YOLOv7. Compared with SSD, YOLOv3, YOLOv5m and YOLOv7, the accuracy of the YOLOv5s model increased by 5.09%, 14.9%, 0.6% and 17%, the recall rate increased by 5.37%, 9.6%, 6.7% and 17.6%, respectively, and [email protected] increased by 4.6%, 10.6%, 3.6% and 14.4%, respectively. Compared with the YOLOv5s model, the CWC-YOLOv5s model improved the recall rate by 1.4% and [email protected] 2.2%, respectively, but sacrificed the inference time with a delay of 1.4 s. CWC-YOLOv5s also improved precision by 0.6%, recall by 11%, and [email protected] by 4% compared to YOLOv8. When precision, recall rate, inference time, and [email protected] are considered, CWC-YOLOv5s outperforms other object detection models in the table.

4.2.2. Ablation Experiment

To comprehensively verify the optimization results of various improved modules and further evaluate the impact of the improved algorithms on the YOLOv5s model, ablation experiment tests were conducted for each improvement in this study, and the test results are shown in Table 4.

According to the data in the table, network 1 is the original YOLOv5s, [email protected] is 91.1%, the recall rate is 83.5%, and the precision is 90.4%. In network 2, the CA mechanism was added to the backbone network, which increased [email protected] by 0.4%, the recall rate by 1.2%, and precision by 0.7%. In network 3, WIoU is used to replace the CIoU loss function on the basis of network 2, in which the detection accuracy, recall rate and [email protected] are all improved, among which the recall rate is the most significantly improved by 2%. In network 4, on the basis of network 3, part of the convolutional layer in the neck is changed into a coordinate convolutional layer, which proves the superiority and effectiveness of CWC-YOLOv5s, the [email protected] reaching 93.3%, which is 2.2% higher than the original YOLOv5s network model, and significantly improves the accuracy of the model detection.

4.3. Analysis of Detection Results

To verify the feasibility of the proposed model and compare the improved network effect more intuitively, the YOLOv5s model and the CWC-YOLOv5s model were tested by the ship’s engine room flame simulation from the ship video system. To make the comparison results clearer, the confidence threshold of the two networks was set to 0.25. The test results are shown in Figure 13. The verification process is mainly to capture the smoke scene images at two different times for detection and comparison. The box in the figure represents the detection area, and the label value represents the confidence level of the model, which is between 0 and 1. The higher the confidence value, the higher the matching degree of the current model, and vice versa.

Figure 13a shows the smoke simulation scenario 1 of the oil rag fire simulation experiment in the engine room workshop. Figure 13b achieved 0.69 confidence through the detection of the YOLOv5s model. After the improvement in the YOLOv5s model, the confidence was improved to 0.78 (Figure 13c). Figure 13d is the smoke simulation scenario 2 in the oil rag fire simulation experiment in the engine room workshop. In Figure 13e, the confidence level of 0.80 is reached through the detection of the YOLOv5s model. After the improvement in the YOLOv5s model, the confidence level is also improved, reaching 0.85, as shown in Figure 13f. It can be seen that the confidence of CWC-YOLOv5s detection is greater.

To further verify the effectiveness of the proposed model, the CWC-YOLOv5s model was tested using the laboratory smoke simulation bench. The test results are shown in Figure 14. The verification process is mainly to capture the smoke scene images from two different perspectives of near distance and far range for detection and comparison. Figure 14a shows the close-distance simulated smoke scene between the engine room pipes. Figure 14b reaches 0.70 confidence when tested by the YOLOv5s model, and 0.08 higher confidence when tested by the CWC-YOLOv5s model, which is the test result shown in Figure 14c. Figure 14d shows the far-distance simulated smoke scene between the engine room pipes. Figure 14e reaches the confidence level of 0.69 through the detection of the YOLOv5s model, and 0.75 through the detection of the CWC-YOLOv5s model, as shown in Figure 14f.

Compared with the overall experimental effect, the CWC-YOLOv5s model is superior to the YOLOv5s model in smoke detection, with greater robustness, better performance and higher detection accuracy and recognition. The experimental results show that the improved model is better than the original model at actual smoke detection in the engine room.

5. Conclusions

To ensure the safety of personnel and property in the case of a fire in the engine room of the ship, a ship smoke detection model based on CWC-YOLOv5s is proposed, which aims at improving the existing smoke detection algorithms of a marine engine room, such as a high error rate, insufficient accuracy and large difference in effect on different data sets. Based on the baseline model YOLOv5s, the coordination attention mechanism is added, the CIoU loss function is replaced, and the coordinate convolution layer is added, which speeds up the model convergence speed, improves the regression accuracy, and makes the algorithm converge faster. When tested on different data sets, the accuracy and recall rate are higher than the original comparison model. The experimental results show that the overall performance of the CWC-YOLOv5s model proposed in this paper is superior to that of YOLOv5s, and that it has good performance in both accuracy and speed, reaching 91.8% precision and an 88.1% recall rate in the data set, but that there is still much room for improvement in the average detection speed. For future field development, research directions can be implemented in the following aspects:

(1): The ship smoke dataset needs to be expanded to increase more ship fire smoke scenarios to improve the quality of the fire smoke dataset. Due to the small number of ship data samples, over-fitting can easily occur in the training process. In the future, data augmentation and other technologies are needed to artificially expand the training data set to improve the effectiveness of the model.
(2): The motion changes of cabin smoke in video sequences can be studied, and some dynamic changes can be added to improve the performance of smoke detection in the future.
(3): In the next work, the model size can be reduced without sacrificing performance to improve the average detection speed.

Author Contributions

Y.Z. (Yongjiu Zou) and J.Z. contributed equally to this work. Conceptualization, Y.Z. (Yongjiu Zou) and J.Z.; methodology, J.Z. and P.Z.; software, J.Z.; validation, J.Z.; data curation, Y.Z. (Yongjiu Zou) and J.Z.; writing—original draft preparation, Y.Z. (Yongjiu Zou) and J.Z.; writing—review and editing, P.Z., T.D., X.J., H.W. and Y.Z. (Yuewen Zhang); supervision, P.Z., Y.Z. (Yuewen Zhang) and P.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the National Natural Science Foundation of China (Grant Nos. 52101400, 52101345), the Scientific Research Fund of the Educational Department of Liaoning Province (LJKMZ20220359), the CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lee, G.J.; Lee, D.; Choi, J.; Kang, H.J. A Concept Study on Design Alternatives for Minimizing Accident Consequences in Maritime Autonomous Surface Ships. J. Mar. Sci. Eng. 2023, 11, 907. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, H.; Zhan, Y.; Deng, K.; Dong, L. Evacuation Strategy Considering Path Capacity and Risk Level for Cruise Ship. J. Mar. Sci. Eng. 2022, 10, 398. [Google Scholar] [CrossRef]
Ventikos, N.P.; Sotiralis, P.; Annetis, M.; Podimatas, V.C.; Boulougouris, E.; Stefanidis, F.; Chatzinikolaou, S.; Maccari, A. The Development and Demonstration of an Enhanced Risk Model for the Evacuation Process of Large Passenger Vessels. J. Mar. Sci. Eng. 2023, 11, 84. [Google Scholar] [CrossRef]
Zhang, H.; Li, C.; Zhao, N.; Chen, B.-Q.; Ren, H.; Kang, J. Fire Risk Assessment in Engine Rooms Considering the Fire-Induced Domino Effects. J. Mar. Sci. Eng. 2022, 10, 1685. [Google Scholar] [CrossRef]
Bu, F.; Gharajeh, M.S. Intelligent and vision-based fire detection systems: A survey. Image Vis. Comput. 2019, 91, 103803. [Google Scholar] [CrossRef]
Kuo, H.C.; Chang, H.K. A real-time shipboard fire-detection system based on grey-fuzzy algorithms. Fire Saf. J. 2003, 38, 341–363. [Google Scholar] [CrossRef]
Wang, S.-J.; Jeng, D.-L.; Tsai, M.-T. Early fire detection method in video for vessels. J. Syst. Softw. 2009, 82, 656–667. [Google Scholar] [CrossRef]
Zou, Y.; Sun, M.; Xu, W.; Zhao, X.; Du, T.; Sun, P.; Xu, M. Advances in Marine Self-Powered Vibration Sensor Based on Triboelectric Nanogenerator. J. Mar. Sci. Eng. 2022, 10, 1348. [Google Scholar] [CrossRef]
Hammond, M.; Rosepehrsson, S.; Gottuk, D.; Lynch, J.; Tillett, D.; Streckert, H. Cermet microsensors for fire detection. Sens. Actuators B Chem. 2008, 130, 240–248. [Google Scholar] [CrossRef]
Jia, Y.; Yuan, J.; Wang, J.; Fang, J.; Zhang, Q.; Zhang, Y. A Saliency-Based Method for Early Smoke Detection in Video Sequences. Fire Technol. 2015, 52, 1271–1292. [Google Scholar] [CrossRef]
Park, K.M.; Bae, C.O. Smoke detection in ship engine rooms based on video images. IET Image Process. 2020, 14, 1141–1149. [Google Scholar] [CrossRef]
Töreyin, B.U.; Dedeoğlu, Y.; Güdükbay, U.; Çetin, A.E. Computer vision based method for real-time fire and flame detection. Pattern Recognit. Lett. 2006, 27, 49–58. [Google Scholar] [CrossRef]
Chen, R.; Xu, Y.-A. Threshold optimization selection of fast multimedia image segmentation processing based on Labview. Multimed. Tools Appl. 2019, 79, 9451–9467. [Google Scholar] [CrossRef]
Shi, G.; Li, X.; Huang, B.; Yan, X. Targets detection in smoke-screen image sequences using fractal and rough set theory. In Proceedings of the 2015 International Conference on Optical Instruments and Technology Optoelectronic Imaging and Processing Technology, Beijing, China, 17–19 May 2015; Volume 9622, p. 962214. [Google Scholar]
Chen, Y.; Li, Z. An Effective Approach of Vehicle Detection Using Deep Learning. Comput. Intell. Neurosci. 2022, 2022, 2019257. [Google Scholar] [CrossRef] [PubMed]
Kim, D.; Ruy, W. CNN-based fire detection method on autonomous ships using composite channels composed of RGB and IR data. Int. J. Nav. Archit. Ocean. Eng. 2022, 14, 100489. [Google Scholar] [CrossRef]
Lu, S. Deep learning for object detection in video. J. Phys. Conf. Ser. 2019, 1176, 042080. [Google Scholar] [CrossRef]
Wu, S.; Zhang, L. Using Popular Object Detection Methods for Real Time Forest Fire Detection. In Proceedings of the 2018 11th International Symposium on Computational Intelligence and Design (ISCID), Wuhan, China, 8–9 December 2018. [Google Scholar]
Li, P.; Zhao, W. Image fire detection algorithms based on convolutional neural networks. Case Stud. Therm. Eng. 2020, 19, 100625. [Google Scholar] [CrossRef]
Chen, N.; Man, Y.; Sun, Y. Abnormal Cockpit Pilot Driving Behavior Detection Using YOLOv4 Fused Attention Mechanism. Electronics 2022, 11, 2538. [Google Scholar] [CrossRef]
Zheng, H.; Duan, J.; Dong, Y.; Liu, Y. Real-time fire detection algorithms running on small embedded devices based on MobileNetV3 and YOLOv4. Fire Ecol. 2023, 19, 31. [Google Scholar] [CrossRef]
Avazov, K.; Hyun, A.E.; Sami S, A.A.; Khaitov, A. Forest Fire Detection and Notification Method Based on AI and IoT Approaches. Sensors 2023, 15, 61. [Google Scholar] [CrossRef]
Mukhiddinov, M.; Abdusalomov, A.B.; Cho, J. Automatic Fire Detection and Notification System Based on Improved YOLOv4 for the Blind and Visually Impaired. Sensors 2022, 22, 3307. [Google Scholar] [CrossRef]
Norkobil Saydirasulovich, S.; Abdusalomov, A.; Jamil, M.K.; Nasimov, R.; Kozhamzharova, D.; Cho, Y.I. A YOLOv6-Based Improved Fire Detection Approach for Smart City Environments. Sensors 2023, 23, 3161. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Wu, Z.; Jia, M.; Xu, T.; Pan, C.; Qi, X.; Zhao, M. Lightweight SM-YOLOv5 Tomato Fruit Detection Algorithm for Plant Factory. Sensors 2023, 23, 3336. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Dong, Y.; Zhao, S.; Zhang, Z. A High-Precision Vehicle Detection and Tracking Method Based on the Attention Mechanism. Sensors 2023, 23, 724. [Google Scholar] [CrossRef]
Guo, S.; Li, L.; Guo, T.; Cao, Y.; Li, Y. Research on Mask-Wearing Detection Algorithm Based on Improved YOLOv5. Sensors 2022, 22, 4933. [Google Scholar] [CrossRef]
Xu, J.; Zou, Y.; Tan, Y.; Yu, Z. Chip Pad Inspection Method Based on an Improved YOLOv5 Algorithm. Sensors 2022, 22, 6685. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Wu, L.; Li, T.; Shi, P. A Smoke Detection Model Based on Improved YOLOv5. Mathematics 2022, 10, 1190. [Google Scholar] [CrossRef]
Wang, H.; Jin, Y.; Ke, H.; Zhang, X. DDH-YOLOv5: Improved YOLOv5 based on Double IoU-aware Decoupled Head for object detection. J. Real-Time Image Process. 2022, 19, 1023–1033. [Google Scholar] [CrossRef]
Xue, J.; Zheng, Y.; Dong-Ye, C.; Wang, P.; Yasir, M. Improved YOLOv5 network method for remote sensing image-based ground objects recognition. Soft Comput. 2022, 26, 10879–10889. [Google Scholar] [CrossRef]
Qiu, S.; Li, Y.; Zhao, H.; Li, X.; Yuan, X. Foxtail Millet Ear Detection Method Based on Attention Mechanism and Improved YOLOv5. Sensors 2022, 22, 8206. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Xiao, Z.; Sun, E.; Yuan, F.; Peng, J.; Liu, J. Detection Method of Damaged Camellia Oleifera Seeds Based on YOLOv5-CB. IEEE Access 2022, 10, 126133–126141. [Google Scholar] [CrossRef]
Hong, W.; Ma, Z.; Ye, B.; Yu, G.; Tang, T.; Zheng, M. Detection of Green Asparagus in Complex Environments Based on the Improved YOLOv5 Algorithm. Sensors 2023, 23, 1562. [Google Scholar] [CrossRef] [PubMed]
Zhang, T.; Zhang, X.; Ke, X. Quad-FPN: A Novel Quad Feature Pyramid Network for SAR Ship Detection. Remote Sens. 2021, 13, 2771. [Google Scholar] [CrossRef]
Wang, S.; Yang, H.; Wu, Q.; Zheng, Z.; Wu, Y.; Li, J. An Improved Method for Road Extraction from High-Resolution Remote-Sensing Images that Enhances Boundary Information. Sensors 2020, 20, 2064. [Google Scholar] [CrossRef]
Yao, X.; Yang, H.; Wu, Y.; Wu, P.; Wang, B.; Zhou, X.; Wang, S. Land Use Classification of the Deep Convolutional Neural Network Method Reducing the Loss of Spatial Features. Sensors 2019, 19, 2792. [Google Scholar] [CrossRef]
Yang, R.; Li, W.; Shang, X.; Zhu, D.; Man, X. KPE-YOLOv5: An Improved Small Target Detection Algorithm Based on YOLOv5. Electronics 2023, 12, 817. [Google Scholar] [CrossRef]
Jiang, X.; Liu, Y.; Song, Z.; Mu, S.; Zhang, P.; Zhang, Y.; Sun, P. Image detection method of Marine engine room fire based on transfer learning. J. Dalian Marit. Univ. 2023, 49, 103–109+116. [Google Scholar] [CrossRef]

Figure 1. Smoke region distribution of the dataset. (a) Label location of the dataset, (b) Label size of the dataset.

Figure 2. Simulation diagram of cabin smoke scene.

Figure 3. Design flow of smoke detection in ship’s engine room.

Figure 4. YOLOv5s network diagram (the number of each block is marked with black numbers on the left side of the block).

Figure 5. Coordinating attention blocks.

Figure 6. Convolutional layer and coordinate convolutional layer diagram.

Figure 7. Structure diagram of CWC-YOLOv5s.

Figure 8. The validation result of CWC-YOLOv5s using visdrone2019 dataset.

Figure 9. Training results of YOLOv5s.

Figure 10. Training results of CWC-YOLOv5s.

Figure 11. Results comparison of loss values. (a) Train/box loss. (b) Train/obj loss. (c) Val/box loss. (d) Val/obj loss.

Figure 12. Comparison of 300 training results. (a) Metrics/precision. (b) Metrics/recall. (c) Metrics/mAP_0.5. (d) Metrics/mAP_0.5:0.95.

Figure 13. Comparison of test results onboard. (a) Smoke simulation scenario 1. (b) YOLOv5s test achieved 69% smoke effect. (c) CWC-YOLOv5s test achieved 78% smoke effect. (d) Smoke simulation scenario 2. (e) YOLOv5s test achieved 80% smoke effect. (f) CWC-YOLOv5s test achieved 85% smoke effect.

Figure 14. Comparison of test results in onshore laboratory. (a) Laboratory close-distance smoke scene. (b) YOLOv5s test achieved 70% smoke effect. (c) CWC-YOLOv5s test achieved 78% smoke effect. (d) Laboratory far-distance smoke scene. (e) YOLOv5s test achieved 69% smoke effect. (f) CWC-YOLOv5s test achieved 75% smoke effect.

Table 1. Image data sets.

Dataset	Training Images	Testing Images	Validation Images	Total
Set 1	1581	197	197	1975
Set 2	2780	347	347	3474

Table 2. Model deployment environment.

Experimental Environment	Disposition
CPU	Intel(R)Core (TM)i7-9750H
GPU	NVIDIA GeForce GTX 1050
Operating system	Win10 × 64
Deep learning library	Pytorch1.13.0
Dependency library	Cuda10.2
Programming environment	Python3.7
Memory type	DDR4-2666, LPDDR3-2133
Storage	SSD: 512 GB

Table 3. Performance comparison results of different models.

Network	Precision (%)	Recall (%)	Inference Times (ms)	[email protected] (%)
SSD	85.31	81.33	19	86.15
YOLOv3	75.5	77.1	19.9	80.5
YOLOv5m	89.8	80	52.9	87.5
YOLOv7	73.4	69.1	78.9	76.7
YOLOv8	91.2	77.1	24.1	89.3
YOLOv5s	90.4	83.5	21.6	91.1
CWC-YOLOv5s	91.8	88.1	23	93.3

Table 4. Results of ablation experiment.

NO	Network	Attention Mechanism	Loss Function	Coordinate Convolution Layer	[email protected] (%)	Recall (%)	Precision (%)
1	YOLOv5s	×	×	×	91.1	83.5	90.4
2	C-YOLOv5s	√	×	×	91.5	84.7	91.1
3	CW-YOLOv5s	√	√	×	91.9	86.7	91.2
4	CWC-YOLOv5s	√	√	√	93.3	88.1	91.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, Y.; Zhang, J.; Du, T.; Jiang, X.; Wang, H.; Zhang, P.; Zhang, Y.; Sun, P. Smoke Detection of Marine Engine Room Based on a Machine Vision Model (CWC-Yolov5s). J. Mar. Sci. Eng. 2023, 11, 1564. https://doi.org/10.3390/jmse11081564

AMA Style

Zou Y, Zhang J, Du T, Jiang X, Wang H, Zhang P, Zhang Y, Sun P. Smoke Detection of Marine Engine Room Based on a Machine Vision Model (CWC-Yolov5s). Journal of Marine Science and Engineering. 2023; 11(8):1564. https://doi.org/10.3390/jmse11081564

Chicago/Turabian Style

Zou, Yongjiu, Jinqiu Zhang, Taili Du, Xingjia Jiang, Hao Wang, Peng Zhang, Yuewen Zhang, and Peiting Sun. 2023. "Smoke Detection of Marine Engine Room Based on a Machine Vision Model (CWC-Yolov5s)" Journal of Marine Science and Engineering 11, no. 8: 1564. https://doi.org/10.3390/jmse11081564

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Smoke Detection of Marine Engine Room Based on a Machine Vision Model (CWC-Yolov5s)

Abstract

1. Introduction

2. Materials and Methods

2.1. Experiment Dataset

2.2. Experimental Environment

2.3. Evaluation Index

2.4. Framework

3. Proposed CWC-YOLOv5s Model

3.1. YOLOv5s Network Model

3.2. Improved YOLOv5s Model

3.2.1. Adding Coordination Attention Mechanism

3.2.2. Replacement Loss Function

3.2.3. Adding a Coordinate Convolution Layer

3.3. Validation of the CWC-YOLOv5s Model on a Public Data Set

4. Case Study

4.1. Training Results

4.2. Analysis of Experimental Results

4.2.1. Performance Comparison

4.2.2. Ablation Experiment

4.3. Analysis of Detection Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI