Object Extraction-Based Comprehensive Ship Dataset Creation to Improve Ship Fire Detection

Akhmedov, Farkhod; Mukhamadiev, Sanjar; Abdusalomov, Akmalbek; Cho, Young-Im

doi:10.3390/fire7100345

Open AccessArticle

Object Extraction-Based Comprehensive Ship Dataset Creation to Improve Ship Fire Detection

¹

Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Gyeonggi-do, Republic of Korea

²

Department of Information Systems and Technologies, Tashkent State University of Economics, Tashkent 100066, Uzbekistan

^*

Authors to whom correspondence should be addressed.

Fire 2024, 7(10), 345; https://doi.org/10.3390/fire7100345

Submission received: 29 July 2024 / Revised: 18 September 2024 / Accepted: 25 September 2024 / Published: 27 September 2024

(This article belongs to the Section Fire Science Models, Remote Sensing, and Data)

Download

Browse Figures

Versions Notes

Abstract

:

The detection of ship fires is a critical aspect of maritime safety and surveillance, demanding high accuracy in both identification and response mechanisms. However, the scarcity of ship fire images poses a significant challenge to the development and training of effective machine learning models. This research paper addresses this challenge by exploring advanced data augmentation techniques aimed at enhancing the training datasets for ship and ship fire detection. We have curated a dataset comprising ship images (both fire and non-fire) and various oceanic images, which serve as target and source images. By employing diverse image blending methods, we randomly integrate target images of ships with source images of oceanic environments under various conditions, such as windy, rainy, hazy, cloudy, or open-sky scenarios. This approach not only increases the quantity but also the diversity of the training data, thus improving the robustness and performance of machine learning models in detecting ship fires across different contexts. Furthermore, we developed a Gradio web interface application that facilitates selective augmentation of images. The key contribution of this work is related to object extraction-based blending. We propose basic and advanced data augmentation techniques while applying blending and selective randomness. Overall, we cover eight critical steps for dataset creation. We collected 9200 ship fire and 4100 ship non-fire images. From the images, we augmented 90 ship fire images with 13 background images and achieved 11,440 augmented images. To test the augmented dataset performance, we trained Yolo-v8 and Yolo-v10 models with “Fire” and “No-fire” augmented ship images. In the Yolo-v8 case, the precision-recall curve achieved 96.6% (Fire), 98.2% (No-fire), and 97.4% mAP score achievement in all classes at a 0.5 rate. In Yolo-v10 model training achievement, we got 90.3% (Fire), 93.7 (No-fire), and 92% mAP score achievement in all classes at 0.5 rate. In comparison, both trained models’ performance is outperforming other Yolo-based SOTA ship fire detection models in overall and mAP scores.

Keywords:

data augmentation; image blending; ship fire dataset; Gradio web app

1. Introduction

The detection and management of fires aboard ships is a critical aspect of maritime safety. With the increasing reliance on automated systems for monitoring and surveillance, the development of robust ML models for ship fire detection has become essential. However, the effectiveness of these models is often constrained by the availability and diversity of training data. In ocean environments, ship fire incidents are rare and challenging to capture, due to several factors related to both the nature conditions of the maritime environment and the operational characteristics of ships, resulting in a scarcity of labeled data for model training. These factors include the hazy ocean scenes and relatively low occurrence of ship fires, from which we can mostly gather certain amounts of ship fire images from internet sources. Fortunately, ship fires are relatively rare due to the extensive safety measures implemented on board. In addition, when ship fires occur, access to the site is often heavily restricted. For safety reasons, only trained personnel are allowed near the incident. This limitation on access reduces the likelihood of images being taken and subsequently shared. Furthermore, many ship fires occur in remote areas of the ocean, far from the reach of media and casual observers. Therefore, the remote location of many ships, such as in the middle of the ocean, makes it difficult to capture images and to reach the site promptly. However, it cannot be denied that there are a certain amount of ship and ship fire images, such as in the Google images search engine. Regarding the World Shipping Council statistics (WSC) [1], the number of container ship fires are in an upward trend, and the estimation of ship fire occurrences is averaged every 60 days across. For the reasons above, and in order to address the scarcity, we aimed to create a ship fire dataset by augmentation techniques. Numerous studies have assessed the effectiveness of data augmentation to leverage well-known academic image datasets.

The success of ML models, especially in computer vision (CV) tasks, is heavily dependent on the quality and quantity of training data. It is widely recognized that larger datasets tend to enhance the performance of deep learning (DL) models. Many researchers [2,3] highlight that models trained on extensive datasets generally exhibit superior accuracy and robustness compared to those trained on smaller datasets. This is because larger datasets provide a broader representation of possible scenarios, reducing the likelihood of overfitting and improving the model’s ability to generalize to new data.

A common issue related to small datasets in CV is that trained models struggle to generalize data from validation and test sets [4]. This lack of generalizations is often due to overfitting, where the model learns to perform well on the training data but fails to adapt to new, unseen data. This issue is exacerbated in domains where data collection is challenging or expensive, leading to a reliance on small datasets. Several advanced techniques have been developed to address the limitations of smaller datasets when developing DL models [5,6,7,8,9]. These include techniques like dropout regularization [10,11,12], batch normalization [13,14] and transfer learning [15,16,17]. Dropout involves randomly dropping units during training, which helps to prevent overfitting by ensuring that the model does not rely too heavily on any single node. In batch normalization, the inputs of each layer are normalized to have a mean of zero and a standard deviation of one, which helps to stabilize the training process and improve the convergence speed. Transfer learning involves using a pre-trained model on a large dataset and fine-tuning it on a smaller set of task-specific datasets. This approach leverages the features learned by the model on the large dataset, which can significantly benefit when labeled data are scarce. Wang et al. [18], for example, explored and compared multiple solutions to the problem of data augmentation, focusing on image classification tasks and experiments. Their study underscores the effectiveness of data augmentation techniques in enhancing model performance. However, assembling extensive datasets presents a formidable challenge, primarily due to the substantial manual effort required for data collection and annotation. This is particularly true for specialized tasks, such as detecting ship fires, where images are rare and often difficult to capture. To evaluate the impact of data augmentation on classification accuracy, it is beneficial to perform a comparative analysis of widely recognized image classification architectures [19,20,21,22,23].

The selected datasets, CIFAR-10/100 and SVHN, are widely applied benchmarks in the field of CV, providing a comprehensive basis for evaluating the effectiveness of data augmentation, as can be seen in Table 1.

These datasets have given many researchers a good advantage to perform many experiments and to compare the performance results of data augmentation techniques. Therefore, data augmentation is a powerful technique in ML to train DL models. This process enhances the diversity and volume of the training dataset without the need for additional manual labeling. By having a bigger dataset, the model can learn better and avoid overfitting. In many cases, especially in fields like autonomous driving and medical imaging, collecting data from all possible real-world conditions is impractical. In addition, by augmentation, we can address class imbalance by generating more examples of not represented classes.

For instance, Esteva et al. [24] mention that D-CNN can perform significantly well for medical image analyses, such as skin lesion classification tasks [25]. Research experiments show that in the case of CIFAR-100, they achieved an increase in performance from 66% to 73%. As mentioned in Table 2, augmentation techniques applied on feature and input space have increased the accuracy in the application of MNIST and CIFAR-10 datasets.

Several studies have proved the advantage of data augmentation in ML to train DL models. In this paper, we propose comprehensive data augmentation techniques based on image blending. By automatically removing the background from source images and integrating these into various target image backgrounds, we can create a vast and diverse dataset that simulates numerous real-world conditions. Figure 1 is an example of a data augmentation illustration with basic and advanced data augmentation techniques combined, which is our proposed approach.

As previously mentioned, D-CNNs perform well in CV tasks with big data applications. This research focused on handling data scarcity related to ocean vessels. We augmented ship fire images by blending source and target images, where the source image is a ship on fire and the target images are various ocean environment scenes. To augment, backgrounds of ship fire images are removed and then attached to source images. During blending, we applied basic and advanced augmentation approaches. In Section 3, we describe in more detail the methods that we utilized in this work. We anticipate that applying augmentation techniques, such as noise injection and scaling methods, for example, will significantly enhance the robustness of our generated dataset. These techniques enable us to create hazy ocean environmental images and simulate faraway distance localization, which in turn aid in improving the detection and recognition of small objects. To see the proposed method’s assistance in better image classification, especially ship fire detection, we fine-tuned Yolo (You Only Look Once) models, such as Yolo-v8 and Yolo-v10, and compared our achievements with other Yolo-based methods in Section 5.

The main contributions of this research are as follows:

Collection of ship and ship fire images from internet sources.
Object-extracted blending of collected images with basic and advanced approaches. To note, our study will not cover all techniques of basic and advanced approaches.
Open-source Gradio web application for data augmentation that we applied in our work. Application can be used for the creation of other limited datasets.

To provide a comprehensive understanding of our research on data augmentation for the ship fire dataset, we have structured the sections of this paper accordingly as follows:

Section 2 is about related research works. This section provides a review of existing literature on data augmentation, focusing on various methods, datasets, and algorithms that have been studied. Section 3 explains the contributions of this study. This part details the unique contributions of our research, specifically focusing on the techniques we employed for data augmentation in the context of ship fire images with blending to various ocean environment images. The key point is to include relevant data and techniques, such as background removal and blending with various ocean images by random location of the source image in the target image. Blending methods include random location, noise injection, rotation, cropping, scaling, and color and contrast change applications. In Section 4, we highlight the development and functionality of the Gradio web application, created to facilitate the testing and visualization of our data augmentation techniques. The final section, Section 5, summarizes the potential research directions and key contributions.

2. Related Work

As illustrated in Figure 1 above, image augmentation techniques are broadly categorized into two main types: basic and advanced approaches. Traditional methods fall under the image manipulation category, encompassing basic augmentation methods. Among these, geometric distortions or deformations, such as affine transformations, are widely used to increase the number of training samples for DNNs [27]. These transformations help to balance sizes [28] and improve the efficiency of the models [29,30,31,32,33]. The evolution of data augmentation techniques in the context of ML, particularly DL, has been a dynamic journey marked by significant innovations [34,35]. Data augmentation involves creating additional training data from existing data to improve model performance, robustness, and generalization. The techniques have evolved from simple transformations to sophisticated generative methods.

2.1. Basic Image Manipulation

Flipping: Horizontally or vertically flipping images to create mirror images, helps the model recognize objects from different perspectives. This approach can help the model recognize objects from different perspectives. For example, a horizontally flipped image can simulate a left-hand view of an object originally viewed from right. Moreover, many objects and scenes in real-world applications have symmetrical properties. By training on flipped images, the model becomes better at recognizing these symmetrical patterns [35,36,37,38].

Rotation: Rotation involves turning the image around its center by a certain angle. Common angles used for rotation are commonly 90, 180, and 270 angles. This approach helps models learn to recognize objects regardless of their orientation. Therefore, we also applied this technique because ships can appear at varied angles in sea waves [39,40].

Scaling: Scaling changes the size of the image by either enlarging or reducing it while maintaining the aspect ratio. Objects in images can appear at various scales due to differences in distance or zoom level. Training on scaled images ensures that the model performs well across different resolutions, which is crucial for applications like image recognition on devices with varying screen sizes [41,42].

Color space transformations: These techniques involve modifying the color properties of images by enhancing the contrast of images by spreading out the most frequent intensity values, modifying the brightness of images to simulate different lighting conditions, and altering the intensity of colors to simulate different environmental conditions [40]. Chatfield et al. [43] observed an approximate 3% decrease in classification accuracy when using grayscale images compared to Red Green Blue (RGB) images in their experiments on the ImageNet [44] and PASCAL VOC [45] datasets. This finding underscores the significance of color information in image classification tasks. Their research highlights that while grayscale images simplify the data and reduce computational requirements, they also lead to a loss of critical color-related features that can be essential for accurate classification. This accuracy drop is particularly relevant in datasets like ImageNet and PASCAL VOC, which contain diverse and complex images where color information plays a crucial role in distinguishing between different classes.

2.2. Generative Models for Data Synthesis

The introduction of generative models, particularly generative adversarial networks (GANs), marked a significant advancement in DL frameworks. Developed by Goodfellow et al. [46] in 2014, GANs comprise tow primary neural networks: the generator and the discriminator. The generator network is designed to produce synthetic data samples that closely mimic real data. By augmenting the training dataset with varied synthetic samples, GANs help mitigate issues related to class imbalance and limited data availability. For example, in medical imaging, GANs are used to generate synthetic scans that mimic real patient data, aiding in the development of diagnostic models. It achieves this by inputting random noise and transforming it into plausible data samples. During the training process, the generator’s objective is to create samples that can deceive the discriminator into recognizing them as genuine. Concurrently, the discriminator’s role is to distinguish accurately between real and synthetic samples. The core aim of GANs is to train the generator’s distribution, denoted as

P_{g} (x),

to closely approximate the real data distribution

P_{r e a h} (x)

over the dataset

(x)

within a minimax adversarial framework. Applications of GANs have found numerous applications across various domains, all related to the generation of new forms of data. For example, GANs have found numerous applications across various domains, including image generation, data augmentation, image-to-image translation, super resolution, and text-to-image synthesis [47].

The remarkable performance of GANs has led to heightened interest in their application for data augmentation. The GAN framework has been extended to enhance the quality of samples produced, particularly when integrated with variational auto-encoders (VAEs) [48]. While GANs have traditionally been employed to generate realistic 2D visual content, innovative applications have emerged. For instance, a study successfully trained a GAN model directly on 3D shapes represented as point clouds, allowing the generator networks to comprehend the inherent structure of 3D shapes and generate complete forms from latent codes. In the domain of text-to-image generation, Zhang et al. [49] were pioneers, demonstrating the capability of GANs to generate images of varying sizes from textual descriptions. Xu et al. [50] advanced this concept by incorporating an attention mechanism within GANs, facilitating the generation of high-resolution images guided by word-level conditioning inputs. This attention mechanism ensures that the generated images are more closely aligned with the given textual descriptions, thereby enhancing the fidelity and relevance of the synthesized images. Additionally, Zhang et al. [51] introduced StackGAN, a framework that progressively refines the resolution of synthesized images through a series of stacked GANs. This iterative enhancement approach enables the generation of high-resolution images with increasingly fine details, addressing the challenge of producing visually coherent and detailed images at higher resolutions.

GANs also facilitate semi-supervised learning by generating labeled data from a small set of real samples. This approach reduces the dependency on large, labeled datasets, making it feasible to develop accurate models with limited labeled data [52,53,54]. GANs can effectively serve as an oversampling technique to address class imbalance issues. Class imbalance, where certain classes are underrepresented in a dataset, poses a significant challenge in ML. Traditional oversampling techniques, such as SMOTE (synthetic minor oversampling technique), create new samples by interpolating between existing ones. However, GANs offer a more sophisticated approach by generating entirely new, realistic samples for the minority class. For example, Lim et al. [55] demonstrated the utility of GAN-generated samples in unsupervised anomaly detection. By oversampling rare normal samples, those with a low occurrence probability, GANs can reduce the false positive rate in anomaly detection tasks. This is achieved through the use of the adversarial auto-encoder (AAE) framework, as proposed by Makhzani et al. [56].

2.3. Style Transfer Approaches

GANs have been employed to transfer styles (ST) between domains, such as converting photos to paintings or transforming daytime images to nighttime scenes. While the neural style transfer (NST) [57] is widely recognized for its artistic applications, it also proves to be an effective tool for data augmentation. NST stands out as one of the most visually striking demonstrations of DL capabilities. The core concept involves manipulating the representations of images created by CNNs. CV models trained on large-scale image datasets often exhibit biases towards local object features such as color and texture [58]. To mitigate this limitation, several ST approaches have been developed to encourage models to focus more on shape information [59,60]. These techniques have been employed to tackle domain adaptation challenges. In such cases, the NST network extracts useful visual features from images with the desired target appearance and applies them to target images that lack this appearance. For example, a study [61] utilized the NST model to transfer images taken during the daytime to appear as nighttime images, a scenario where labeled data are typically scarce. Introducing a fixed set of styles can lead to a new form of bias within the augmented dataset. To address this, methods like StyleAugment and StyleGan [62,63] advocate for the random selection or target selection of styles. This approach either ensures that the augmented images reflect a broader range of variations or are target-oriented, thereby enhancing the model’s ability to generalize.

3. Proposed Methods, Model Architecture

3.1. Background Removing

From Section 2, we have seen that by artificially enlarging the training dataset through various transformations, the application of data augmentation is becoming significantly helpful to develop a robust model by mitigating overfitting, covering scarce datasets, improving model generalization, and boosting overall accuracy. In this research, we propose a data augmentation approach that involves blending source and target images, applying a suite of augmentation techniques during the blending process. These techniques include random application of background-removed ship fire images while blending to source ocean images. Random application is crucial in every technique that we use in blending. These techniques include random resizing, cropping, blurring, positioning, color change, and usual flipping approaches.

Background removal, a specific form of augmentation, has gained prominence for its ability to isolate objects of interest from non-essential elements in an image. The image background removal library we use in our project was developed by Daniel Gatis. Removing the background from images is a technique used in image processing where the non-essential parts of an image are excluded, leaving only the foreground objects of interest. This process typically involves distinguishing the primary subject from the background using algorithms such as thresholding, edge detection, segmentation, and deep learning-based models. In our approach, the initial step was set to target image, where we focused on the primary subject and removed its background. For a given image I, represented as a matrix of pixel values:

I = \{I_{i, j}| i \in [1, M], j \in [1, N]}

(1)

where M and N are the dimensions of the image, and

I_{i, j}

denotes the pixel value at position (

i, j

).

Next, a segmentation algorithm is applied to divide the image into regions, producing a segmentation mask S:

S = \{S_{i, j}| i \in [1, M], j \in [1, N]}

(2)

where

S_{i, j}

is a binary value indicating whether pixel (

i, j

) belongs to the foreground (1) or the background (0).

After that, the processed image I’, after background removal is obtained by element-wise multiplication of the original image I and the segmentation mask S:

I^{'} = I \circ S

(3)

where, ∘ denotes the Hadamard product (element-wise multiplication).

The blending image function is designed to seamlessly overlay a ship image onto a background image at a specified position, ensuring proper handling of transparency. This function accepts three parameters, as can be seen in Table 3.

The RGBA conversion ensures whether the background image supports transparency by including the alpha channel. The alpha channel acts as a mask in its represented channels to delineate the transparent and opaque regions of the ship image. The mask confirms that only the non-transparent parts of the ship image are pasted onto the background, allowing for smooth blending and preserving the visual integrity of both images.

3.2. Image Resizing and Positioning

The input image is converted to the RGBA mode to ensure the image has four channels, such as red, green, blue, and alpha—transparency. The inclusion of the alpha channel is particularly important for preserving transparency information, which is crucial when compositing the resized image onto different backgrounds. The dimensions of the resized image are computed by multiplying the original width and height of the ship image by the scale factor value. This step involves the use of integer typecasting to ensure the new dimensions are whole numbers. The image is resized to the newly calculated dimensions using the LANSZOS resampling algorithm, which is a popular method for image scaling.

To simulate different scales and perspectives, the isolated object is resized using a scaling factor. This is randomly selected within a predefined range to ensure variability. The resized object is then positioned onto the source image at random coordinates, creating a natural appearance of the object within different contexts.

Scaling by a factor of s:

I_{s c a l e d} (x^{'}, y^{'}) = I (\frac{x}{s}, \frac{y}{s})

(4)

where (

x, y

) are the coordinates in the original image, and (

x^{'}, y^{'}

) are the coordinates in the scaled image.

A more formal representation of the LANCZOS resampling process for a discrete set of pixels as follows in the equation below:

I^{'} (x^{'}) = \sum_{x} I (x) \cdot L (x^{'} - x)

(5)

-: $I (x)$ is the original pixel value at position $x$ .
-: $I^{'} (x^{'})$ is the resampled pixel value at the new position $x^{'}$ .
-: $L (x^{'} - x)$ is the Lanczos kernel applied to the distance between the original and new pixel positions.

3.3. Random Flip, Rotation and Combination

The input image was rotated at random angles to simulate different orientations. It was then combined with the target background image using blending techniques that ensure smooth integration and natural appearance. This step resulted in a final augmented image that was saved for further usage.

Horizontal flip:

I_{f l i p p e d_{h}} (x, y) = I (W - x - 1, y)

(6)

Vertical flip:

I_{f l i p p e d_{v}} (x, y) = I (x, H - y - 1)

(7)

where I(

x, y

) is the pixel at coordinates (

x, y

) in the original image, W is the width, and H is the height.

Rotation by and angle

θ

:

I_{r o t a t e d} (x^{'}, y^{'}) = I (\cos (θ) \cdot x + \sin (θ) \cdot y) - (\sin (θ) \cdot x + \cos (θ) \cdot y)

(8)

where (

x, y

) are the coordinates in the original image, and (

x^{'}, y^{'}

) are the coordinates in the rotated image.

3.4. Blur Effects

We applied the Gaussian blur effect to simulate various levels of atmospheric effects and depth of field. Noise injection is a matrix value usually drawn from a Gaussian distribution [4], and it can help networks learn more robust features. Our work covers noise injection to prevent the model from learning the noise in the training data, which can lead to overfitting. By making the model robust to noise, it learns to better generalize unseen data and reach training stability. Our noise injection equation is as follows:

I_{n o i s y} (x, y) = I (x, y) + N (x, y)

(9)

where,

N (x, y) ~ N^{'} (μ, σ^{2})

is Gaussian noise with mean

μ

and variance

σ^{2}

.

In the next step, we applied a blur filter to the image. The Gaussian blur is a smoothing operation that reduces the high-frequency components in the image, effectively introducing an atmospheric effect. We set the blur radius from 1 to 20 and applied it to the blended ship image at random values. From the Gradio web app, the user can set blur ratio to zero, depending on their purpose [64,65].

3.5. Change Color Channels and White Balance Adjustment

Lighting biases are one of the most frequently facing challenges in image recognition tasks. Digital image data are typically represented as a tensor with dimensions corresponding to height, width, and color channels [4]. Chatifled et al. [43] mention that there is a 3% accuracy drop between grayscale and RGB image classification.

Implementing augmentations within the color channel space is an effective and straightforward form for individual color channels.

I_{c o l o r_c h a n g e d} (x, y) = (R (x, y), G (x, y), B (x, y))

(10)

where

R, G a n d B

represent the red, green, and blue channels, respectively.

The white balance function is designed to correct color imbalances between target ship images and source ocean background images by performing white balance adjustment. With this, we ensure that colors in the image appear natural and accurate under varying lighting conditions. Proper white balance adjustment is crucial for maintaining color fidelity, especially in applications such as digital photography and image analysis or visual perception studies.

To calculate average intensities of each color channel, we have

c \in {R, G, B}

:

c_{a v g} = \frac{1}{N} \sum_{i = 1}^{N} I_{c, i}

(11)

where,

I_{c, i}

represents the intensity of the i-th pixel in channel c, and N is the total number of pixels in the channel.

Afterwards, we compute scaling factor of channel c as:

{S c a l i n g_{f a c t o r}}_{c} = \frac{a v g}{c_{a v g}}

(12)

where avg is the average intensity computed across all channels.

Last, we adjust pixel intensities of

{I^{'}}_{c, i}

in channel c using:

{I^{'}}_{c, i} = I_{c, i} \cdot {S c a l i n g_{f a c t o r}}_{c}

(13)

3.6. Change Contrast Button and Equalize Histogram

The contrast of the image is randomly adjusted to simulate different lighting conditions. This technique is used to improve the contrast of the image, making the objects more distinct.

I_{c o l o r_c h a n g e d} (x, y) = α \cdot I (x, y) + (1 - α) \cdot I_{m e a n}

(14)

where

α

is the contrast factor, and

I_{m e a n}

is the mean intensity of the image.

The histogram equalization function is pivotal in preprocessing images for various computer vision tasks. Histogram equalization adjusts the cumulative distribution function (CDF) of pixel intensities to enhance contrast. By enhancing the contrast through HE, the function ensures that the images exhibit a more uniform distribution of pixel intensity. The process can be described mathematically as follows:

First, we compute the CDF

c (i) = \sum_{j = 0}^{i} \frac{h (j)}{N}

(15)

where N is the total number of pixels in the image.

Next, we compute the map intensity values by transforming each pixel intensity

I_{o r i g} u s i n g t h C D F

as:

I_{n e w} = f l o o r ((L - 1) \cdot c (I_{o r i g}))

(16)

where L is the number of possible intensity levels, and

I_{n e w}

is the new intensity value.

3.7. Change Sharpness Button

Then, we enhance image sharpness in a random range as we applied to other augmentation methods. The sharpness of the image is varied to simulate different camera focus settings.

I_{s h a r p n e s s} (x, y) = (1 - β) \cdot I (x, y) + β \cdot I_{d e t a i l} (x, y)

(17)

where

β

is the sharpness factor, and

I_{d e t a i l}

is the detailed enhanced image.

By iterating the above steps multiple times with different sources and target images, a massive dataset can be generated. Each iteration introduces random variations in scaling, positioning, rotation, blurring, color change, noise injection, and other OE-focused methods, resulting in a highly diverse and comprehensive dataset [66,67,68].

3.8. Random Crop Function

We aimed to perform a random cropping (erasing) operation in an input image. This technique is a common data augmentation method used to increase the diversity of training datasets by generating multiple variations of the original image. Random cropping can help improve the robustness and generalization of ML models by providing a wider range. In this approach, we randomly cropped sections from background images after both target and source images were blended. This cropping might even crop ship fires images as well. The rectangular region is set to 500 by 500 in height and width, respectively. From the Gradio web app, it is possible to specify the height and width of a random rectangle area.

Image cropping (random erasing [69]) is a common technique in image processing and data augmentation. In image recognition, this method mainly focuses on overcoming occlusion-related challenges, such as unclear part representation on objects. Mathematically,

I

is the image dimensions of width (

W

) and height (

H

) of the image. The goal is to crop a region of size (

C_{w}

,

C_{h}

) randomly from

I

, where

C_{w}

and

C_{h}

are the width and height of the cropping image region. For cropping coordinate detection, we randomly selected the top-left corner of the cropping region. Let

(x, y)

be the coordinates, then:

x = r a n d (0, W - C_{w}),

(18)

y = r a n d (0, H - C_{h}),

(19)

where,

r a n d (a, b)

is a function that returns a random integer between

a

and

b

.

We then set a transparency to that cropped area, denoted as

I_{c},

in transparent image size

{(C}_{w}

,

C_{h})

. The transparent region

I_{c}

is then pasted onto the original image

I

with transparency. This result in

I^{'}

is given by:

{I^{'}}_{(x : x + C_{w} y : y + C_{h})} = I_{c}

(20)

This equation indicates that the region starting at (x,y) and extending

C_{w}

pixels in width and

C_{h}

pixels in height is replaced by the transparent region

I_{c} .

This process effectively creates an occlusion in the image, simulating scenarios where parts of the object are hidden. As we mentioned above, we have developed a Gradio web app, an interface to facilitate an interactive platform for performing and visualizing various image augmentation operations.

This system enables users to upload a target and a background image to generate a set of augmented images with varying parameters. In Table 4, we can see the selection values of data augmentation. Based on the given features, data augmentation parameters can be customized. For example, if you want to have a bigger cropped area on the image by width or height, it is adjustable, ranging from 10 to 500. Similarly, other parameters are also available to select certain range values.

3.9. Dataset Collection

Figure 2 depicts the Gradio web interface, where we have an input target and source image. By sliding the OE augmentation method included functions, we can adjust data generation into our project.

Table 5 provides a detailed summary of the image dataset utilized for the task of ship fire detection, including both original internet source images and their augmented counterparts. The datasets are categorized into two primary classes, such as “Fire” and “No-fire”, with a further distinction between internet-sourced images and augmented samples in combination with ocean background images. For blending, the background should be a kind of image that represents all kinds of ocean scenarios. Similarly, sea vessels also should be varied, as the ocean environment covers many kinds of sea vessels. The initial dataset comprises 9200 images of ship fires and 4100 images of ships without fire. To increase the dataset size, a sample of 90 ship fire images was subjected to augmentation, resulting in an expanded dataset of 11,440 images. The extended dataset is generated by blending with 13 background ocean images.

Figure 3 depicts a sample of ship images in an ocean environment. Ships in the image are considered as main objects that are extracted from the background and combined with other ocean images. Ocean and ship images are target and source images, respectively. Of these images, we augment one single-source ship image with another ocean target image. However, during blending source and target image, we can increase the number of images.

In Figure 4, various different ocean environments are represented. An extracted ship image can represent various parts of an ocean image in applied augmentation techniques, such as noise injection (blur effect), rotation scaling, and random cropping techniques.

4. Experimental Results

In this research, we proposed several data augmentation techniques to source and target images for the purpose of ship fire image classification improvement. In Figure 5a, a sample ship fire (boat) image is represented. From Figure 5a, the main target image is extracted from the ocean environment and shown in Figure 5b. This is the initial step for data augmentation.

In Figure 6, OE target images are attached to source ocean background images. In the red circles, we highlighted the source image location in the ocean image. As can be seen, augmentation techniques change the ocean image blur effect, sharpness, and a ship image’s random locations. Using the Gradio web application, we can change the source image location in the x-y axis and all other mentioned metrics listed in Table 4. Finally, in Figure 7, we see the OE ship fire dataset creation process beginning with the input ship fire image as the source and progressing to the extraction step, blending to target the ocean background image with the application of the data augmentation techniques.

To compare the augmented dataset performance, we labeled our dataset and trained the labeled dataset in Yolo-v8 and Yolo-v10 versions. Both models’ classification accuracy was high; therefore, we chose these models to fine-tune for ship fire detection purposes. Figure 8 and Figure 9 show training image examples with bounding box information on labeled classes. From figures, we can see that ship images are relatively clear to distinguish and identify, which is the “No-fire” class dataset. However, in the “Fire” class, we achieved our proposed style of dataset and type. In other words, we obtained hazy ship fire images with a different look, scale, size, and other applied augmentation technique results. From the comparative analysis of Table 1, it can be concluded that having a large variety of datasets increases the model’s detection and classification accuracy.

We trained both models on 100 epochs (Figure 10). The training and validation loss plots show a decreasing trend, indicating that the models are improving as training progresses. The loss curves appear smoother, but there are some fluctuations in both models’ cases, possibly due to challenges in model optimization. In comparison, training and validation losses for the Yolo-v10 fine-tuned model for ship fire detection are consistently lower and smoother compared to Yolo-v8.

Metrics, such as precision, recall, and mean average precision (mAP), improve over epochs but show irregular behavior, especially after 50 epochs. These fluctuations are considered as our future research work that we focus on to optimize model training and solve instability in training or validation performances. The mAP score is a key metric for object detection tasks, measuring the accuracy of the bounding boxes. Yolo-v10 showed a higher and more consistent mAP than Yolo-v8, which demonstrates that the fine-tuned Yolo-v10′s performance is higher at detecting and classifying ship fires.

In Figure 11, precision-recall (PR) curves are presented to illustrate the performance of both models’ training. In Yolo-v8, the mAP at the intersection over union (IoU) threshold of 0.5 reached 0.91% accuracy, which demonstrates the model’s relatively high accuracy in detecting and classifying ship fires. The PR curve for Yolo-v10 demonstrates slightly improved precision at 0.913 and recall of 0.937, indicating a better balance between precision and recall compared to Yolo-v8. However, in terms of Yolo-v8, the exact scores were 0.866 and 0.912 in precision and recall metrics. The mAP score of Yolo-v10 at the IoU ([email protected]) is also higher for Yolo-v10 at 0.937, as can be seen in Figure 11 and Figure 12.

Table 6 compares the ship fire detection models’ achievements. Applied data augmentation and blending techniques for ship fire image, the custom fine-tuned models, such as Yolo-v8 and Yolo-10, clearly outperformed the more generalized models like, Yolo-v3-Tiny, Yolo-v5s, and Yolo-v7-Tiny. The fine-tuned Yolo-v10 model stands out, delivering the best precision, recall, F1 score, and [email protected], which indicates proposed method superiority for accurately detecting ship fires in diverse ocean environments.

5. Limitations

One limitation of this research is the reliance on synthetic data augmentation to address the scarcity of real-world ship fire images. While techniques like blending, noise injection, and scaling improve the diversity of the dataset, they may not fully capture the complexity of natural ocean environments, such as varying weather conditions, lighting, or ship movements. This could limit the generalization of the model when applied to real-world scenarios, particularly in extreme or less frequently encountered conditions. Additionally, in our study we focused on a subset of augmentation techniques, leaving unexplored methods that might further enhance model robustness, such as more sophisticated image generation approaches.

6. Conclusions

In this study, we presented an OE data augmentation technique based on image blending, which involves automatically removing backgrounds from source images and combining them with target images. This method allows for the generation of massive and diverse datasets, enhancing the training and generalization capabilities of ML models. Nowadays, many researchers focus on GAN, or style transfer architectures, as a focus area of interest. However, GANs can produce synthetic data with artifacts and noise that do not appear in real data and require significant computational resources. Perez et al. [73], for example, proposed a neural style transfer algorithm that required two images from samples. However, the style transfer augmentation approach did not have any impact on the MNIST problem, which is the generation of a numerical dataset. Our approach is superficial with keeping OE identity by avoiding unrealistic subtlety, which cannot mislead the training process and degrade model performance. Moreover, our proposed OE data augmentation methods can iteratively generate huge amount of data, so that training of ML models does not suffer from a lack of data and diversity.

We used a dataset of ship images and an ocean image as a background to blend ship images in ocean environments. The ship images were processed using the background removing algorithm. The resulting isolated ship images were then merged with the source background image and blended into the OE augmentation pipeline.

In our future work, we will focus on optimizing data augmentation functionalities even further and wish to combine with Yolo (You Only Look Once) models to automate CV model training processes in a more straightforward manner.

Author Contributions

F.A. conceived this study, conducted the research, developed methodology, experimental analysis and wrote the manuscript; Y.-I.C., A.A. and S.M. contributed valuable advice and feedback for research development. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Korea Agency for Technology and Standards in 2022. The project numbers are as follows: 1415181638 (Establishment of standardization basis for BCI and AI Interoperability), 1415181629 (Development of International Standard Technologies based on AI Model Lightweighting Technologies), and 1415180835 (Development of International Standard Technologies based on AI Learning and Inference Technologies).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Acknowledgments

This research benefited significantly from the valuable advice and support provided by Y.-I.C. and A.A. whose contributions were instrumental in shaping the direction and development of this work. We extend our gratitude to all authors for their collaboration and agreement to the publication of this manuscript. We gratefully acknowledge the financial support from the Korea Agency for Technology and Standards in 2022, which made this research possible.

Conflicts of Interest

The authors declare no conflict of interest.

References

Available online: https://www.worldshipping.org/ (accessed on 12 June 2024).
Halevy, A.; Norvig, P.; Pereira, F. The unreasonable effectiveness of data. IEEE Intell. Syst. 2009, 24, 8–12. [Google Scholar] [CrossRef]
Chen, S.; Abhinav, S.; Saurabh, S.; Abhinav, G. Revisting Unreasonable Effectivness of Data in Deep Learning Era. In Proceedings of the ICCV, Venice, Italy, 22–29 October 2017; pp. 843–852. [Google Scholar]
Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1106–1114. [Google Scholar] [CrossRef]
Karen, S.; Andrew, Z. Very Deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Kaiming, H.; Xiangyu, Z.; Shaoqing, R.; Jian, S. Deep Residual Learning for Image Recognition. In Proceedings of the CVPR, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Christian, S.; Vincent, V.; Sergey, I.; Jon, S.; Zbigniew, W. Rethinking the inception architecture for computer vision. arXiv 2015, arXiv:1512.00567. [Google Scholar]
Gao, H.; Zhuang, L.; Laurens, M.; Kilian, Q.W. Densely connected convolutional networks. arXiv 2016, arXiv:1608.06993. [Google Scholar]
Moayed, H.; Mansoori, E.G. Skipout: An Adaptive Layer-Level Regularization Framework for Deep Neural Networks. IEEE Access 2022, 10, 62391–62401. [Google Scholar] [CrossRef]
Bacciu, D.; Crecchi, F. Augmenting Recurrent Neural Networks Resilience by Dropout. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 345–351. [Google Scholar] [CrossRef]
Qian, L.; Hu, L.; Zhao, L.; Wang, T.; Jiang, R. Sequence-Dropout Block for Reducing Overfitting Problem in Image Classification. IEEE Access 2020, 8, 62830–62840. [Google Scholar] [CrossRef]
Fei, W.; Dai, W.; Li, C.; Zou, J.; Xiong, H. On Centralization and Unitization of Batch Normalization for Deep ReLU Neural Networks. IEEE Trans. Signal Process. 2024, 72, 2827–2841. [Google Scholar] [CrossRef]
Zhijie, Y.; Lei, W.; Li, L.; Shiming, L.; Shasha, G.; Shuquan, W. Bactran: A Hardware Batch Normalization Implementation for CNN Training Engine. IEEE Embed. Syst. Lett. 2021, 13, 29–32. [Google Scholar] [CrossRef]
Nie, L.; Li, C.; Marzani, F.; Wang, H.; Thibouw, F.; Grayeli, A.B. Classification of Wideband Tympanometry by Deep Transfer Learning With Data Augmentation for Automatic Diagnosis of Otosclerosis. IEEE J. Biomed. Health Inform. 2022, 26, 888–897. [Google Scholar] [CrossRef] [PubMed]
Kuldashboy, A.; Umirzakova, S.; Allaberdiev, S.; Nasimov, R.; Abdusalomov, A.; Im Cho, Y. Efficient image classification through collaborative knowledge distillation: A novel AlexNet modification approach. Heliyon 2024, 10, e34376. [Google Scholar] [CrossRef]
Zhang, R.; Yao, W.; Shi, Z.; Ai, X.; Tang, Y.; Wen, J. Towards Multi-Scenario Power System Stability Analysis: An Unsupervised Transfer Learning Method Combining DGAT and Data Augmentation. IEEE Trans. Power Syst. 2023, 38, 5367–5380. [Google Scholar] [CrossRef]
Khujamatov, H.; Pitchai, M.; Shamsiev, A.; Mukhamadiyev, A.; Cho, J. Clustered Routing Using Chaotic Genetic Algorithm with Grey Wolf Optimization to Enhance Energy Efficiency in Sensor Networks. Sensors 2024, 24, 4406. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
Huang, L.; Yuan, Y.; Guo, J.; Zhang, C.; Chen, X.; Wang, J. Interlaced sparse self-attention for semantic segmentation. arXiv 2019, arXiv:1907.12273. [Google Scholar]
Krizhevsky, A.; Hinton, G. Convolutional Deep Belief Networks on Cifar-10; University of Toronto: Toronto, ON, Canada, 2010; Volume 40, pp. 1–9. [Google Scholar]
Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading digits in natural images with unsupervised feature learning. NIPS Workshop Deep. Learn. Unsupervised Feature Learn. 2011, 2011, 4. [Google Scholar]
Francisco, J.M.-B.; Fiammetta, S.; Jose, M.J.; Daniel, U.; Leonardo, F. Forward Noise Adjustment Scheme for Data Augmentation. In Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 18–21 November 2018. [Google Scholar]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef]
Tomohiko, K.; Michiaki, I. Icing on the cake: An easy and quick post-learning method you can try after deep learning. arXiv 2018, arXiv:1807.06540. [Google Scholar]
Terrance, V.; Graham, W.T. Dataset Augmentation in Feature Space. In Proceedings of the International Conference on Machine Learning (ICML), Workshop Track, Sydney, Australia, 10–11 August 2017. [Google Scholar]
Kwasigroch, A.; Mikołajczyk, A.; Grochowski, M. Deep Convolutional Neural Networks as a Decision Support Tool in Medical Problems–Malignant Melanoma case Study; Trends in Advanced Intelligent Control, Optimization and Automation. In Advances in Intelligent Systems and Computing, KKA 2017, Kraków, Poland, 18–21 June 2017; Mitkowski, W., Kacprzyk, J., Oprzędkiewicz, K., Skruch, P., Eds.; Springer: Cham, Switzerland, 2017; Volume 577, pp. 848–856. [Google Scholar]
Kwasigroch, A.; Mikołajczyk, A.; Grochowski, M. Deep Neural Networks Approach to Skin Lesions Classification—A Comparative Analysis. In Proceedings of the 2017 22nd International Conference on Methods and Models in Automation and Robotics (MMAR), Miedzyzdroje, Poland, 28–31 August 2017; pp. 1069–1074. [Google Scholar]
Wąsowicz, M.; Grochowski, M.; Kulka, M.; Mikołajczyk, A.; Ficek, M.; Karpieńko, K.; Cićkiewicz, M. Computed Aided System for Separation and Classification of the Abnormal Erythrocytes in Human Blood. In Proceedings of the Biophotonics—Riga, Riga, Latvia, 27–29 August 2017; Volume 10592, p. 105920A. [Google Scholar]
Makhmudov, F.; Kultimuratov, A.; Cho, Y.-I. Enhancing Multimodal Emotion Recognition through Attention Mechanisms in BERT and CNN Architectures. Appl. Sci. 2024, 14, 4199. [Google Scholar] [CrossRef]
Shijie, J.; Ping, W.; Peiyi, J.; Siping, H. Research on Data Augmentation for Image Classification Based on Convolution Neural Networks. In Proceedings of the 2017 Chinese Automation Congress (CAC), Jinan, China, 20–22 October 2017; pp. 4165–4170. [Google Scholar]
Makhmudov, F.; Kutlimuratov, A.; Akhmedov, F.; Abdallah, M.S.; Cho, Y.-I. Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders. Electronics 2022, 11, 4047. [Google Scholar] [CrossRef]
Frid-Adar, M.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. Synthetic Data Augmentation Using Gan for Improved Liver Lesion Classification. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 289–293. [Google Scholar]
Saydirasulovich, S.N.; Mukhiddinov, M.; Djuraev, O.; Abdusalomov, A.; Cho, Y.-I. An Improved Wildfire Smoke Detection Based on YOLOv8 and UAV Images. Sensors 2023, 23, 8374. [Google Scholar] [CrossRef]
Rasheed, Z.; Ma, Y.-K.; Ullah, I.; Ghadi, Y.Y.; Khan, M.Z.; Khan, M.A.; Abdusalomov, A.; Alqahtani, F.; Shehata, A.M. Brain Tumor Classification from MRI Using Image Enhancement and Convolutional Neural Network Techniques. Brain Sci. 2023, 13, 1320. [Google Scholar] [CrossRef]
Agnieszka, M.; Michal, G. Data Augmentation for Improving Deep Learning in Image Classification Problem. In Proceedings of the IEEE 2018 international interdisciplinary Ph.D. Workshop, Swinoujscie, Poland, 9–12 May 2018. [Google Scholar]
Fabio, P.; Christina, V.; Sandra, A.; Eduardo, V. Data Augmentation for Skin Lesion Analysis. In Proceedings of the ISIC Skin Image Analysis Workshop and Challenge, MICCAI 2018, Granada, Spain, 20 September 2018. [Google Scholar]
Navneet, D.; Bill, T. Histograms of Oriented Gradients for Human Detection. In Proceedings of the CVPR, San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
Mingyang, G.; Kele, X.; Bo, D.; Huaimin, W.; Lei, Z. Learning data augmentation policies using augmented random search. arXiv 2018, arXiv:1811.04768. [Google Scholar]
Akhmedov, F.; Nasimov, R.; Abdusalomov, A. Developing a Comprehensive Oil Spill Detection Model for Marine Environments. Remote Sens. 2024, 16, 3080. [Google Scholar] [CrossRef]
Alexander, B.; Alex, P.; Eugene, K.; Vladimir, I.I.; Alexandr, A.K. Albumentations: Fast and flexible image augmentations. arXiv 2018, arXiv:1809.06839. [Google Scholar]
Ren, W.; Shengen, Y.; Yi, S.; Qingqing, D.; Gang, S. Deep image: Scaling up image recognition. arXiv 2015, arXiv:1501.02876. [Google Scholar]
Ken, C.; Karen, S.; Andrea, V.; Andrew, Z. Return of the Devil in the Details: Delving Deep into Convolutional Nets. In Proceedings of the BMVC, Nottingham, UK, 1–5 September 2014. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the CVPR09, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
Mark, E.; Luc, V.G.; Christopher, K.I.W.; John, W.; Andrew, Z. The Pascal Visual Object Classes (VOC) Challenge. 2008. Available online: http://www.pascal-network.org/challenges/VOC/voc2008/workshop/ (accessed on 12 July 2024).
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the NIPS, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Akhmedov, F.; Nasimov, R.; Abdusalomov, A. Dehazing Algorithm Integration with YOLO-v10 for Ship Fire Detection. Fire 2024, 7, 332. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. Auto-encoding variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Zhang, J.; Chen, X.; Cai, Z.; Pan, L.; Zhao, H.; Yi, S.; Yeo, C.K.; Dai, B.; Loy, C.C. Unsupervised 3D Shape Completion through gan Inversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Xu, T.; Zhang, P.; Huang, Q.; Zhang, H.; Gan, Z.; Huang, X.; He, X. Attngan: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. In Proceedings of the CVPR, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Zhang, H.; Xu, T.; Li, H.; Zhang, S.; Wang, X.; Huang, X.; Metaxas, D. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; pp. 5907–5915. [Google Scholar]
Abdusalomov, A.B.; Nasimov, R.; Nasimova, N.; Muminov, B.; Whangbo, T.K. Evaluating Synthetic Medical Images Using Artificial Intelligence with the GAN Algorithm. Sensors 2023, 23, 3440. [Google Scholar] [CrossRef] [PubMed]
Umirzakova, S.; Ahmad, S.; Khan, L.U.; Whangbo, T. Medical image super-resolution for smart healthcare applications: A comprehensive survey. Inf. Fusion 2023, 103, 102075. [Google Scholar] [CrossRef]
Umirzakova, S.; Mardieva, S.; Muksimova, S.; Ahmad, S.; Whangbo, T. Enhancing the Super-Resolution of Medical Images: Introducing the Deep Residual Feature Distillation Channel Attention Network for Optimized Performance and Efficiency. Bioengineering 2023, 10, 1332. [Google Scholar] [CrossRef] [PubMed]
Swee, K.L.; Yi, L.; Ngoc-Trung, T.; Ngai-Man, C.; Gemma, R.; Yuval, E. DOPING: Generative data augmentation for unsupervised anomaly detection with GAN. arXiv 2018, arXiv:1808.07632. [Google Scholar]
Alireza, M.; Jonathon, S.; Navdeep, J.; Ian, G.; Brendan, F. Adversarial autoencoders. arXiv 2015, arXiv:1511.05644. [Google Scholar]
Leon, A.G.; Alexander, S.E.; Matthias, B. A neural algorithm of artistic style. arXiv 2015, arXiv:1508.06576. [Google Scholar]
Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv 2018, arXiv:1811.12231. [Google Scholar]
Chun, S.; Park, S. StyleAugment: Learning texture de-biased representations by style augmentation without pre-defined textures. arXiv 2021, arXiv:2108.10549. [Google Scholar]
Hong, M.; Choi, J.; Kim, G. Stylemix: Separating Content and Style for Enhanced Data Augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14862–14870. [Google Scholar]
Gatys, L.A.; Ecker, A.S.; Bethge, M. Image Style Transfer Using Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
Philip, T.J.; Amir, A.A.; Stephen, B.; Toby, B.; Boguslaw, O. Style augmentation: Data augmentation via style randomization. arXiv 2018, arXiv:1809.05375. [Google Scholar]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of Stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Muksimova, S.; Umirzakova, S.; Mardieva, S.; Cho, Y.-I. Enhancing Medical Image Denoising with Innovative Teacher–Student Model-Based Approaches for Precision Diagnostics. Sensors 2023, 23, 9502. [Google Scholar] [CrossRef]
Umirzakova, S.; Whangbo, T.K. Detailed feature extraction network-based fine-grained face segmentation. Knowl.-Based Syst. 2022, 250, 109036. [Google Scholar] [CrossRef]
Abdusalomov, A.; Rakhimov, M.; Karimberdiyev, J.; Belalova, G.; Cho, Y.I. Enhancing Automated Brain Tumor Detection Accuracy Using Artificial Intelligence Approaches for Healthcare Environments. Bioengineering 2024, 11, 627. [Google Scholar] [CrossRef] [PubMed]
Ergasheva, A.; Akhmedov, F.; Abdusalomov, A.; Kim, W. Advancing Maritime Safety: Early Detection of Ship Fires through Computer Vision, Deep Learning Approaches, and Histogram Equalization Techniques. Fire 2024, 7, 84. [Google Scholar] [CrossRef]
Abdusalomov, A.; Kilichev, D.; Nasimov, R.; Rakhmatullayev, I.; Cho, Y.I. Optimizing Smart Home Intrusion Detection with Harmony-Enhanced Extra Trees. IEEE Access 2024, 12, 117761–117786. [Google Scholar] [CrossRef]
Zhun, Z.; Liang, Z.; Guoliang, K.; Shaozi, L.; Yi, Y. Random erasing data augmentation. arXiv 2017, arXiv:1708.04896. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
ultralytics, YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 11 November 2023).
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
Buriboev, A.S.; Rakhmanov, K.; Soqiyev, T.; Choi, A.J. Improving Fire Detection Accuracy through Enhanced Convolutional Neural Networks and Contour Techniques. Sensors 2024, 24, 5184. [Google Scholar] [CrossRef]

Figure 1. Object extraction blending framework combined with other approaches.

Figure 2. Gradio web app interface for OE data augmentation.

Figure 3. Collected sea vessel image samples for ship “No-fire images”.

Figure 4. Sample ocean background images for data augmentation with target ship images.

Figure 5. Representation of a ship fire image (a) and the background removed ship fire images (b).

Figure 6. Framework of OE (source) data augmentation with blending to background (target) image.

Figure 7. Data augmentation results in proposed data augmentation methods. Red cycle is indicating ship fire localization in background image.

Figure 8. Labeled dataset example with bounding boxes in training set images. A value of 0 is a representation of a “Fire” class image, and 1 is a representation of a “No-fire” class image.

Figure 9. Example of labeled validation set images with bounding box information and class names.

Figure 10. Training Yolo-v8 and Yolo-v10 on 100 epochs.

Figure 11. Yolo-8 and Yolo-v10 model fine-tuning on augmented ship fire detection dataset.

Figure 12. PR curve of Yolo-v8 and Yolo-v10 in labeled classes at [email protected] rate.

Table 1. Comparative analysis of image classification enhancement by data augmentation.

Dataset	Model Name	Before Augmentation (%)	After Augmentation (%)	Accuracy Improvement by (%)
CIFAR-10	DenseNet	94.15	94.59	0.44
	Wide-ResNet	93.34	93.67	1.33
	Shake-ResNet	93.7	94.84	1.11
CIFAR-100	DenseNet	74.98	75.93	0.95
	Wide-ResNet	74.46	76.52	2.06
	Shake-ResNet	73.96	76.76	2.80
SVHN	DenseNet	97.91	97.98	0.07
	Wide-ResNet	98.23	98.31	0.80
	Shake-ResNet	98.37	98.40	0.30

Table 2. Experimental performance based on feature vs. input space extrapolation on MNIST and CIFAR-10 datasets [26].

Model	MNIST	CIFAR-10
Baseline	1.093 ± 0.057	30.65 ± 0.27
Baseline + input space affine transformations	1.477 ± 0.068	-
Baseline + input space extrapolation	1.010 ± 0.065	-
Baseline + feature space extrapolation	0.950 ± 0.036	29.24 ± 0.27

Table 3. OE source image blending with target image formulation.

Input	background, ship_image, position
Output	blended_image
background	background.convert(“RGBA”)
mask	ship_image.split() [3]
background.paste(ship_image, position, mask)
return background

Table 4. Gradio web app interface parameters.

Description	Values (Range)
Number of images	1–1000
Target image x—axis	0–1500
Target image y—axis	0–1500
Output folder	Specify
Contrast factor	0.5–3
Sharpness factor	0.5–3
Blur radius	0–20
Rotation angle	0–360
Equalize histogram	yes/no
Apply white balance	yes/no
Random crop width	10–500
Random crop height	10–500
Apply random crop	yes/no

Table 5. Internet source and augmented datasets.

Internet Source Images		Total
Ship Fire		9200
Ship No—Fire		4100
Internet Source (Sample)		Augmented
Ship fire	90	11,440
Background	13

Table 6. Comparative analysis of ship fire detection model metrics with other Yolo-based architectures.

Models	Precision	Recall	F1	[email protected]
Yolo-v3-Tiny [70]	0.797	0.821	0.81	0.745
Yolo-v5s [71]	0.901	0.855	0.88	0.814
Yolo-v7-tiny [72]	0.857	0.7921	0.82	0.758
Yolo-v8n (custom data)	0.866	0.912	0.888	0.910
Yolo-v10n (custom data)	0.913	0.937	0.925	0.937

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akhmedov, F.; Mukhamadiev, S.; Abdusalomov, A.; Cho, Y.-I. Object Extraction-Based Comprehensive Ship Dataset Creation to Improve Ship Fire Detection. Fire 2024, 7, 345. https://doi.org/10.3390/fire7100345

AMA Style

Akhmedov F, Mukhamadiev S, Abdusalomov A, Cho Y-I. Object Extraction-Based Comprehensive Ship Dataset Creation to Improve Ship Fire Detection. Fire. 2024; 7(10):345. https://doi.org/10.3390/fire7100345

Chicago/Turabian Style

Akhmedov, Farkhod, Sanjar Mukhamadiev, Akmalbek Abdusalomov, and Young-Im Cho. 2024. "Object Extraction-Based Comprehensive Ship Dataset Creation to Improve Ship Fire Detection" Fire 7, no. 10: 345. https://doi.org/10.3390/fire7100345

Article Menu

Object Extraction-Based Comprehensive Ship Dataset Creation to Improve Ship Fire Detection

Abstract

1. Introduction

2. Related Work

2.1. Basic Image Manipulation

2.2. Generative Models for Data Synthesis

2.3. Style Transfer Approaches

3. Proposed Methods, Model Architecture

3.1. Background Removing

3.2. Image Resizing and Positioning

3.3. Random Flip, Rotation and Combination

3.4. Blur Effects

3.5. Change Color Channels and White Balance Adjustment

3.6. Change Contrast Button and Equalize Histogram

3.7. Change Sharpness Button

3.8. Random Crop Function

3.9. Dataset Collection

4. Experimental Results

5. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI