MSCF-Net: Attention-Guided Multi-Scale Context Feature Network for Ship Segmentation in Surveillance Videos

Jiang, Xiaodan; Ding, Xiajun; Jiang, Xiaoliang

doi:10.3390/math12162566

Open AccessArticle

MSCF-Net: Attention-Guided Multi-Scale Context Feature Network for Ship Segmentation in Surveillance Videos

by

Xiaodan Jiang

¹,

Xiajun Ding

^1,* and

Xiaoliang Jiang

^2,*

¹

College of Electrical and Information Engineering, Quzhou University, Quzhou 324000, China

²

College of Mechanical Engineering, Quzhou University, Quzhou 324000, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(16), 2566; https://doi.org/10.3390/math12162566

Submission received: 15 July 2024 / Revised: 8 August 2024 / Accepted: 15 August 2024 / Published: 20 August 2024

(This article belongs to the Section Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

With the advent of artificial intelligence, ship segmentation has become a critical component in the development of intelligent maritime surveillance systems. However, due to the increasing number of ships and the increasingly complex maritime traffic environment, the target features in these ship images are often not clear enough, and the key details cannot be clearly identified, which brings difficulty to the segmentation task. To tackle these issues, we present an approach that leverages state-of-the-art technology to improve the precision of ship segmentation in complex environments. Firstly, we employ a multi-scale context features module using different convolutional kernels to extract a richer set of semantic features from the images. Secondly, an enhanced spatial pyramid pooling (SPP) module is integrated into the encoder’s final layer, which significantly expands the receptive field and captures a wider range of contextual information. Furthermore, we introduce an attention module with a multi-scale structure to effectively obtain the interactions between the encoding–decoding processes and enhance the network’s ability to exchange information between layers. Finally, we performed comprehensive experiments on the public SeaShipsSeg and MariBoatsSubclass open-source datasets to validate the efficacy of our approach. Through ablation studies, we demonstrated the effectiveness of each individual component and confirmed its contribution to the overall system performance. In addition, comparative experiments with current state-of-the-art algorithms showed that our MSCF-Net excelled in both accuracy and robustness. This research provides an innovative insight that establishes a strong foundation for further advancements in the accuracy and performance of ship segmentation techniques.

Keywords:

ship segmentation; multi-scale context feature; spatial pyramid pooling; attention module; deep learning

MSC:

68T45

1. Introduction

Throughout history, waterway shipping has been the most important mode of transportation for human beings. Due to its characteristics of low energy consumption, large volume, and low freight rate, it has been highly valued by various countries and relevant departments. However, the rising number of ships and the water traffic environment have become increasingly complex; incidents, such as illegal fishing, overloading, illegal escape, transportation of dangerous goods, and smuggling, have brought huge challenges to smooth navigation and ecological, environmental protection in relevant waters. In this context, how to build an independent and efficient intelligent water transport comprehensive supervision platform and create a safe and stable water transport environment is imminent. With the popularization of surveillance camera installation, ship dynamics can be monitored for all-weather real-time. This method is prone to various limitations, such as physical fatigue and lack of experience, which can lead to reduced recognition accuracy, missed detections, and other issues. With the emergence and reform of artificial intelligence, computer vision, pattern recognition, and other technologies have achieved good results on natural scene images. By learning from training samples annotated with boundary boxes, they have been widely adopted in maritime ship detection. Different from ship detection, semantic segmentation offers pixel-level classification, which can display the detailed edge information of the ship to better understand the scene information in the surveillance video. Therefore, ship segmentation has become a vital task within intelligent maritime surveillance systems, which can provide valuable assistance to ensure the safety of ship navigation.

At present, ship detection and segmentation research mostly focuses on satellite remote sensing images, synthetic aperture radar (SAR) images, and infrared images, while visible light images are relatively scarce. Although satellite remote sensing images have a macro perspective covering a wide range of ocean areas, their low resolution cannot provide detailed stereoscopic observation. SAR images are generated by synthetic aperture radar technology, which has the advantage of all-weather, all-sky imaging and is suitable for various weather and lighting conditions. However, SAR images often contain coherent speckle noise and a low signal-to-noise ratio, which affects the image quality and subsequent processing. Infrared imaging technology can penetrate haze and smoke to a certain extent and has a strong penetration ability for ships, but the target features in its images are usually not obvious enough to clearly identify ship details. In contrast, visible light images contain rich color and texture features and have higher resolution, which not only provides more delicate visual details but also has the advantages of low observation cost and easy acquisition. In addition, compared with long-distance imaging techniques, such as SAR and infrared images, visible light instance segmentation can effectively deal with large targets in short distances to make up for the defects of low sensitivity of marine environment perception accuracy. Therefore, the research on segmentation algorithms based on visible images is of great significance for improving the accuracy of marine monitoring and the development of ship intelligence.

Visible image segmentation is an important branch in machine vision, with several traditional algorithms such as region-based, fuzzy clustering, and specific theory. However, these conventional methods often suffer from limited representation capabilities and inefficiencies in feature extraction. With the development of image segmentation technology toward more accurate, efficient, and adaptive, deep learning has emerged as a promising solution that meets these requirements. In 2015, Long et al. [1] pioneered the application of a fully convolutional neural network to image segmentation, which proved the strong learning ability and adaptability of neural networks. Consequently, numerous neural network-based methods have emerged, providing innovative opportunities for achieving precise segmentation. Among the various segmentation algorithms, U-Net [2] stands out as one of the most renowned and widely used. Its underlying principle revolves around a symmetric encoder–decoder structure, which has proven to be very effective in the accurate depiction of structures in medical image analysis and other fields. Subsequently, Rampriya et al. [3] introduced an innovative railway semantic segmentation network that incorporates a residual layer, multiple attention layers, space, and channel compression excitation blocks. It not only uses fewer parameters but also significantly improves on-board processing efficiency, making it ideal for real-time railway semantic segmentation tasks. Similarly, Rashid et al. [4] proposed a spatial attention module that preserves low-level spatial features and a multistage downsampling method to balance accuracy and inference time, which can affect the real-time processing of high-resolution urban landscape photos. Wu et al. [5] proposed an enhanced MultiResUNet tailored to the specific characteristics of paddy field scenarios. Their approach includes an attention-gate mechanism to generate weights, which emphasizes the field ridge region’s response. Additionally, an atrous spatial pyramid pooling block is integrated to enhance the recognition and delineation of small-scale feature details.

Recently, significant efforts have been dedicated to ship segmentation in challenging environments. Among them, Ma et al. [6] built upon the FasterYOLO by incorporating an adaptive attention mechanism designed to enhance the critical features of ships. Moreover, this improvement is coupled with a Transformer architecture, which fuses contextual information to ensure robustness and applicability in extreme and challenging interference conditions. Sun et al. [7] proposed a double-activated branch for instance segmentation of ship images, which adopted a pyramid structure in feature coding to facilitate the extraction of fine-grained features necessary for accurate segmentation. In feature decoding, the model extracts dual-path mask features, thus achieving a delicate balance between high accuracy and robustness. Peng et al. [8] proposed an advanced 2D OTSU ship segmentation method, which utilizes a genetic algorithm to dynamically optimize the segmentation threshold and adapt it to the specific features of inland ship images. This adaptive threshold mechanism helps to effectively distinguish ship targets from complex backgrounds, and it can reduce false positives and improve the overall segmentation accuracy. Zhang et al. [9] developed an innovative decoder designed to aggregate hierarchical feature maps, which enables the learning of robust representations. This approach can capture complex details across multiple feature levels, significantly improving the quality of the final output. Sun et al. [10] introduced an accurate and efficient ship instance segmentation algorithm that utilizes both global and local attention mechanisms. This approach ensures that the algorithm captures the overall context and fine detail and significantly advances research and practical applications related to ship segmentation and analysis, and it addresses various challenges posed by complex marine environments. Sun et al. [11] introduced an innovative approach for enhancing ship instance segmentation by leveraging the region of interest pool and global mask head techniques. Their methodology focused on preserving the crucial global location and semantic information of each instance, which are often essential for accurate segmentation. Yuan et al. [12] introduced an innovative adaptive attention mechanism designed to manage the intricate interplay of features extracted at varying hierarchical levels. This sophisticated integration strategy not only enhances the precision of the segmentation results but also maintains an impressive processing speed, thereby achieving an optimal balance between high accuracy and efficiency.

Although the aforementioned approaches have significantly advanced the field of ship segmentation, due to the increase in the number of ships and the increasingly complex water traffic environment, such as rain, haze, and low illumination, the quality of visible images is seriously degraded, and accurate segmentation under these unfavorable conditions remains a difficult challenge. To overcome these problems, we propose a ship segmentation method called MSCF-Net. In our approach, there are three improvements involved: multi-scale context feature; spatial pyramid pooling; and attention module with multi-scale structure. The integration of these modules allows our approach to capture richer semantic information and receptive domains, and it facilitates the exchange of information between network layers. The contribution of our study can be summarized as follows:

We construct a multi-scale context feature module by applying various convolution kernels, which can capture a wider and more diverse set of semantic features from images. This approach enables the network to effectively distinguish between fine-grained details and complex structures in images, thereby enhancing its ability to understand and represent different aspects of visual content;
We integrate the enhanced spatial pyramid pooling module into the last layer of the encoder. This advanced module significantly broadens the receptive area, which can better understand and interpret spatial relationships and complex details in images;
We integrate an attention module with a multi-scale structure to effectively capture interactions between the encoding and decoding processes. This mechanism enhances the ability of the network to exchange information between layers, significantly improving the segmentation of complex structures and fine details.

2. Materials and Methods

2.1. Network Architecture of Our MSCF-Net

The U-Net is characterized by a U-shaped encoder–decoder structure, which extracts semantics directly by compressing the data in the encoder. However, semantic information embedded in shallow feature maps is significantly different from that embedded in deep feature maps, and direct skip connections complicate network training and degrade performance. Therefore, we propose MSCF-Net for precise segmentation of ships in complex environments. Firstly, MSCF-Net uses multiple convolution cores to construct a multi-scale context feature module, which can capture a wider and more diversified semantic feature from images and improve the ability of the network to interpret complex details. Next, an enhanced SPP module is integrated into the final layer of the encoder, which significantly improves the neural network’s ability to understand and interpret spatial relationships and complex image details by expanding the receptive domain and capturing broader contextual information. Additionally, MSCF-Net integrates an attention module with a multi-scale structure, which is essential for efficiently capturing the interaction between encoding and decoding processes. The comprehensive architecture of MSCF-Net is depicted in Figure 1, and the binary prediction results are finally output.

2.2. Architecture of Multi-Scale Context Feature Module

In Figure 2a, the encoding and decoding modules within the U-Net are composed of a traditional convolution block. This block is structured with two consecutive 3 × 3 convolutional layers. While this design maintains the simplicity and efficiency of fixed-size kernels, it inherently restricts the network’s capacity to capture broader contextual information and distant dependencies across the image. This limitation is particularly significant in the domain of medical imaging, where the size of lesions and abnormalities can vary dramatically from one case to another. Moreover, medical images often have complex textures and varying scales of anatomical structures that require a multi-scale approach for accurate segmentation. To solve the above difficulties, the traditional convolutional blocks in the U-Net are replaced with multi-scale context feature modules. As illustrated in Figure 2b, the multi-scale context feature module employs a dual-branch asymmetric convolutional combination to optimize performance within the constraints of limited computational resources. This innovative design aims to extract richer semantic information and generate more detailed feature maps. In the upper branch of the network, a combination of 3 × 3 and 5 × 5 convolution is utilized. These smaller convolution kernels can efficiently capture details and local features in an image, and they provide a balance between computational efficiency and detailed feature extraction, ensuring that the network can process high-resolution images without excessive computational load. The lower branch of the network adopts a combination of larger 7 × 7 and 9 × 9 convolutions. These larger kernels expand the network’s receptive field, and they are able to capture more contextual information and understand a wider range of spatial relationships in the image. By incorporating these larger convolutions, the network can better interpret complex scenes and accurately segment objects that vary significantly in size and shape. To ensure that the information extracted from the two branches can be effectively integrated, the features extracted from the upper and lower branches are added together. Then, we put the combined features through 1 × 1 convolution to recover the number of channels and fuse the information from all channels. In summary, the multi-scale context feature module adopts a two-branch asymmetric convolution design, which effectively balances the need for detail feature extraction with computational efficiency. By integrating convolution and fusion mechanisms of different sizes, our module captures a wide range of semantic information and enhances the overall performance of the network in complex image segmentation tasks.

2.3. Architecture of Spatial Pyramid Pooling Module

As depicted in Figure 3a, the spatial pyramid pooling structure effectively captures multi-scale information by utilizing expansion coefficients of various sizes. In this architecture, feature extraction is handled by separate branches, each dedicated to a specific scale. At the conclusion of the network, the information from these distinct branches is fused and outputs them by a convolution layer. This method is particularly effective in avoiding the problem of redundant information acquisition within the encoder and focuses on identifying and analyzing the correlation between different objects. Inspired by the principles of pyramidal structures [13,14] and the techniques used in atrous convolution [15,16], we designed an advanced spatial pyramid pooling module. This module is strategically positioned at the bottom of the encoder, as depicted in Figure 3b, to significantly improve the network’s ability to obtain and interpret spatial relationships within the images. Specifically, our SSP module operates sequentially using three convolutional layers, each with different expansion rates of 2, 4, and 6. The first layer, with a scaling rate of 2, focuses on capturing finer details, while subsequent layers, with scaling rates of 4 and 6, progressively capture broader contextual information. This design allows the network to effectively gather context information at different scales, which can enhance its ability to detect local and global features in images. After multi-scale convolution, a cascade extended convolution module is introduced. This module includes multiple branches: a 1 × 1 convolution branch for fine-grained feature extraction; a 3 × 3 convolution branch for multiple different expansion rates for collecting different attributes; and a pooled branch for summarizing feature maps. The outputs of these branches are then connected to form a comprehensive feature map that integrates multi-scale information. Next, the concatenated feature maps undergo channel adjustment through a 1 × 1 convolution to harmonize the number of channels. This adjusted feature map is subsequently fed into the decoder, ensuring that the decoder receives a rich and well-integrated representation of the input image. The main advantage of the SSP mechanism is that multi-scale information can be obtained through natural convolution at different expansion rates, which effectively compensates for the global feature information lost during continuous downsampling. By capturing and retaining a wide range of features, the SSP module can improve the overall segmentation accuracy and robustness.

2.4. Architecture of Attention Module with Multi-Scale Structure

Inspired by the architectural principles of InceptionNet, we propose a novel attention module with a multi-scale structure, as shown in Figure 4. This module is meticulously designed to enhance the extraction of both local features and their surrounding contextual information, thus improving the overall feature representation. Specifically, we adopted a three-branch structure in the feature extraction module, which allows us to aggregate information from local regions of different sizes for effective multi-scale feature extraction. To optimize computational efficiency while maintaining a large receptive field, our three convolutional branches are composed of two consecutive convolutions, each parameterized as 1 × n and n × 1, with n set to 3, 7, and 11, respectively. This design choice ensures that we can extract diverse scale features without a significant increase in computational cost. In addition, we introduce an attention refinement module (ARM) [17] between the two convolutions within each branch. As depicted in Figure 5, the ARM comprises four main elements: global average pooling; two 1 × 1 convolutional layers; batch normalization; and the Sigmoid function. By integrating these components, the ARM significantly boosts the model’s ability to discern and amplify crucial features. Then, the outputs from the three branches are joined together along the channel dimension to form a unified feature map. This aggregated feature map is subjected to a 1 × 1 convolution to integrate the information from different scales into a coherent representation. Subsequently, the resulting feature map is processed using the sigmoid function to produce an attention score matrix. Finally, the attention score matrix is multiplied by the original input features to generate the refined feature representation.

2.5. Loss Function

In our study, each image in the dataset is meticulously annotated with a corresponding binary mask to facilitate precise segmentation. To address the inherent challenges associated with ship segmentation, we use the dice function [18,19] as a loss calculation strategy, which can be formulated as follows:

L_{d i c e} (y, p) = 1 - \frac{2 \sum_{i = 1}^{N} p_{i} y_{i}}{\sum_{i = 1}^{N} y_{i} + \sum_{i = 1}^{N} p_{i}}

(1)

where

N

is the number of pixels;

p_{i}

and

y_{i}

are the ground truth and predicted value of pixel

i

.

3. Experiments

3.1. Dataset Description

To rigorously assess the performance of MSCF-Net in the ship segmentation task, we selected two comprehensive ship image datasets. Due to the different shapes, sizes, and orientations of ships, as well as the changing and complex Marine environmental context, ship segmentation faces major challenges. Recognizing the computational constraints and the need for consistency in data preprocessing, we adjusted the size of each image slice to 256 × 256 pixels. The two ship datasets contain a variety of scenarios, including a ship docked in a port, a ship sailing in open water, and a ship partially obscured by other objects or environmental elements. Each image is accompanied by a carefully labeled ground-truth mask that highlights the exact area occupied by the ship. Details of each dataset are as follows:

3.1.1. SeaShipsSeg

The SeaShipsSeg [9] is an advanced and meticulously curated collection designed specifically for ship segmentation tasks. It is derived from the famous ship target detection database SeaShips [20], which is known for its comprehensive coverage of ships at sea under various conditions. Zhang et al. carefully selected the images and labeled them using Labelme 3.16.7 software to ensure the accurate delineation of ship boundaries. To enhance the robustness and generalization of segmentation algorithms, they introduced simulated degradation effects such as haze, rain, and low-light conditions. Figure 6 provides a visual illustration of the ships in the SeaShipsSeg dataset and their corresponding annotations. In our study, we set up 768 for training, 192 for validation, and 240 for testing. The dataset can be downloaded at https://github.com/GrimreaperZ-creator/SeaShipsSeg, accessed on 1 May 2024.

3.1.2. MariBoatsSubclass

Sun et al. undertook the task of creating a comprehensive ship segmentation dataset by sourcing image data from the Google image platform. They meticulously used Labelme 3.16.7 software to segment and label these selected ocean ship images to obtain the segmentation dataset of ship instances, which was named MariBoatsSubclass [10]. To improve the generalization ability of the algorithm under different conditions, at least one of the data enhancement methods of horizontal flipping, scaling, and multi-scale input was used to expand the dataset. The total number of MariBoatsSubclass dataset images is 3125, which can meet the basic case segmentation requirements. Figure 7 provides a visual illustration of the ships in the MariBoatsSubclass dataset and their corresponding annotations. For the initial stage of training, this dataset was divided into three parts: 624 images for testing; 1876 for training; and 625 for verification. This dataset can be obtained from https://github.com/s2120200252/Visible-ship-dataset, accessed on 1 May 2024.

3.2. Evaluation Metrics

To comprehensively evaluate the performance of MSCF-Net, four key metrics were employed: dice [21,22]; recall [23,24]; Matthews correlation coefficient (Mcc) [25,26]; and Jaccard [27,28]. Each metric was calculated using the following formulas:

D i c e = \frac{2 T P}{2 T P + F N + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

M c c = \frac{T P \times T N - F P \times F N}{\sqrt{(T P + F N) (T P + F P) (T N + F N) (T N + F P)}}

(4)

J a c c a r d = \frac{T P}{T P + F N + F P}

(5)

3.3. Implementation Details

In the Windows 10 environment, all models were constructed using the PyTorch framework and trained on a GeForce RTX 4090 GPU. The Adam optimizer [29], initialized with a learning rate of 1 × 10⁻³, was employed to train the model over 200 epochs, and the batch size was chosen to be 16. To maintain consistency and comparability between different experiments, the same parameter settings and loss functions were applied to both two datasets during the training process of all models. Throughout the training and validation stages, the loss and accuracy of the models were meticulously monitored and recorded, as illustrated in Figure 8. Notably, in the SeaShipsSeg database, MSCF-Net showed a rapid reduction in losses over the first 100 iterations. During the later stages, both the loss and accuracy metrics remained nearly consistent between the training set and the evaluation set. This consistency shows that MSCF-Net effectively generalizes from training data to unseen data, which demonstrates its robustness and adaptability. Conversely, in the case of the MariBoatsSubclass database, there is a noticeable gap between the loss and accuracy of the training set compared to the evaluation set. However, after about 150 iterations, these curves begin to level off. This stability reflects the convergence ability of the model, indicating that the strategy and method are effective in the design and training process. The convergence and stability of these metrics show that the model can not only learn complex patterns in the data but also maintain performance across different data sets. This robustness is essential to ensure that the model performs reliably in a variety of scenarios and lays a solid foundation for its potential application in real-world ship segmentation tasks. The convergence and stabilization of these metrics demonstrate that MSCF-Net is not only capable of learning complex patterns within the data but also of maintaining performance across different datasets.

3.4. Ablation Experiments

To evaluate the effectiveness of each module in MSCF-Net, we performed a series of ablation studies. We started with a five-layer U-Net network as the baseline and integrated the multi-scale context features module (MCFM), spatial pyramid pooling module (SPPM), and attention module with a multi-scale proposed structure (AMMS) into the baseline network in sequence. The ablation results on the SeaShipsSeg dataset are listed in Table 1. Firstly, the impact of adding the SPPM to the baseline network was evident in the performance metrics. By incorporating SPPM, the dice increased by 1.78%, recall by 5.00%, Mcc by 1.67%, and Jaccard by 3.11%. These improvements indicate that the SPPM enhances the network’s ability to obtain and utilize multi-scale features from the bottleneck layer. By comparing the models before and after adding AMMS, it can be seen that Dice, recall, Mcc, and Jaccard improved by 1.73%, 3.57%, 1.65%, and 2.88%, respectively. This significant improvement can be attributed to AMMS’s capability to enhance image edge information through dimensional attention, which effectively refines the segmentation boundaries and improves overall accuracy. The addition of MCFM resulted in an increase of 1.92% in Dice, 6.13% in recall, 1.84% in Mcc, and 3.37% in Jaccard. This considerable enhancement underscores the importance of MCFM in capturing semantic information. When combining the baseline network with SPPM, AMMS, and MCFM, the experimental results showed a synergistic effect, leading to the highest segmentation performance across all metrics. This combination underscores the rationality and effectiveness of the proposed network modules.

To further confirm the effectiveness of the improved modules, we thoroughly tested each module before and after applying the improvements. As shown in Table 2, with the improvement in traditional convolutional blocks, MCFM significantly improves the accuracy and performance indicators of the model. Similarly, the SPPM and AMMS modules have also made remarkable progress after improvement. These findings underscore the value of our approach and confirm that modifications successfully improve the overall performance and effectiveness of this model.

3.5. Comparison with the State-of-the-Art Algorithms

To demonstrate the effectiveness of MSCF-Net, we performed comparative experiments with U-Net [2], MHorUNet [23], BCMNet [30], ESFNet [31], FSSNet [32], HFENet [33], LMFFNet [34], CGRNet [35], MSFCN [36], CSCAUNet [37], SCSONet [38], LANet [39], AMFU_Net [40], MSCSFNet [41], and MCCNet_VGG [42]. The evaluation metrics considered for this comparison were Dice, recall, Mcc, and Jaccard, which are crucial for assessing the accuracy and reliability of the segmentation approach. Table 3 to Table 4 present the quantitative comparison of the above algorithms on SeaShipsSeg and MariBoatsSubclass, respectively.

3.5.1. Results on SeaShipsSeg

According to Table 3, we find that U-Net achieved a dice of 0.9210, recall of 0.8742, Mcc of 0.9201, and Jaccard of 0.8568. This indicates that U-Net performs well in segmenting ship areas, although there are still some shortcomings in terms of recall. The dice obtained by MHorUNet is 0.9079, recall is 0.8949, Mcc is 0.9044, and Jaccard is 0.8359. Although the recall is slightly higher than U-Net, it indicates that MHorUNet has better detection ability, but the overall segmentation performance is slightly lower. Dice is 0.9324, recall is 0.9297, Mcc is 0.9296, and Jaccard is 0.8756, which reflects the balanced and reliable segmentation ability of BCMNet. The low scores of various indicators in ESFNet indicate its poor performance in accurately segmenting ship images. FSSNe showed robust results with a dice of 0.9312, recall of 0.9228, Mcc of 0.9284, and Jaccard of 0.8730, indicating that its performance is close to BCMNet. HFENet, LMFFNet, and CGRNet have achieved good performance, proving that these three methods can effectively segment ship images. MSFCN and CSCAUNet achieved the closest metrics to our proposed method, which reflects their strong comprehensive performance and reliability in ship segmentation tasks. SCSONet achieved a dice of 0.9076 and recall of 0.8980, Mcc is 0.9039, and Jaccard is 0.8347; its performance is poor. LANet posted a dice of 0.9308, recall of 0.9196, Mcc of 0.9280, and Jaccard index of 0.8726, showcasing reliable performance across various metrics. AMFU_Net excels in all metrics, showcasing its effectiveness in handling complex segmentation tasks. MSCSFNet demonstrates a well-rounded performance but falls slightly behind the leading models in dice and Jaccard scores. MCCNet_VGG stands out with the highest Mcc and Jaccard scores, indicating its superior precision and recall capabilities. MSCF-Net outperformed all these models with a dice of 0.9502, recall of 0.9498, Mcc of 0.9481, and Jaccard of 0.9066. These results clearly demonstrate MSCF-Net’s superior accuracy, robustness, and adaptability in ship segmentation tasks, affirming its effectiveness in achieving high-quality segmentation outcomes.

To provide a clear and detailed visual comparison of the segmentation performance across different models, we present visualizations of MSCF-Net along with U-Net, MHorUNet, BCMNet, ESFNet, FSSNet, HFENet, LMFFNet, CGRNet, MSFCN, CSCAUNet, SCSONet, and LANet in Figure 9. As the baseline of U-Net, many ships are not sufficiently captured, and some instances of complex backgrounds are incorrectly segmented, leading to a large number of false positives. While other models have made some improvements over U-Net, they still face significant challenges in handling complex scenarios. These models often miss finer details and fail to accurately depict ship boundaries, resulting in a lack of precision and completeness in segmentation. In sharp contrast, MSCF-Net demonstrates a markedly superior segmentation performance. It excels at accurately identifying and segmenting large, irregularly shaped ships with intricate boundaries. This capability significantly reduces segmentation errors and instances of missed segmentation, making it ideal for performing complex ship segmentation tasks in a variety of challenging environments.

3.5.2. Results on MariBoatsSubclass

In addition, to thoroughly assess the generalization performance of MSCF-Net, we performed additional experiments on a set of ship images and reported the numerical statistics of the MariBoatsSubclass dataset in Table 4. We also presented the corresponding visualization results in Figure 10 to provide a comprehensive comparison. Our analysis reveals that MSCF-Net outperforms the baseline U-Net by significant margins, achieving improvements of 5.09% in Dice, 11.91% in recall, 5.69% in Mcc, and 7.89% in Jaccard. These results highlight the enhanced segmentation capabilities of MSCF-Net, especially in accurately identifying and delineating ship boundaries. Furthermore, MSCF-Net also surpasses the second-best performing model, MCCNet_VGG, with improvements of 0.31% in Dice, 0.12% in recall, 0.42% in Mcc, and 0.50% in Jaccard. These marginal yet notable gains underscore the superiority of MSCF-Net in handling the complexities of ship image segmentation. The visualization results in Figure 10 clearly demonstrate that MSCF-Net produces more accurate and finer segmentation outputs compared to other models. This is particularly evident in scenarios involving complex backgrounds and irregularly shaped ships, where MSCF-Net excels in maintaining the integrity and precision of the segmented regions. The visual comparisons further validate the robustness and adaptability of MSCF-Net, making it an excellent choice for real-world ship segmentation tasks across diverse and challenging environments.

3.5.3. Computational Complexity

To thoroughly assess the computational efficiency of MSCF-Net, we performed a comparative analysis with other prominent models in the field. Table 5 presents a detailed comparison of the computational complexity across several networks, focusing on key metrics such as the number of parameters and the frames per second (FPS) during inference. These metrics provide a comprehensive overview of the trade-offs between model size, computational cost, and operational speed. According to Table 5, while U-Net and HFENet demonstrate fast operating speeds; models such as ESFNet, FSSNet, CGRNet, MSFCN, and LANet also have significant efficiency advantages. However, these models often fall short in terms of segmentation performance. In contrast, our MSCF-Net achieves a more comprehensive balance, which achieves high precision and operational efficiency. This balance is critical for applications that require real-time processing, such as timely and accurate segmentation of intelligent maritime surveillance systems. Although our MSCF-Net has reached a good balance, how to further reduce its computational complexity is still a key direction for future research. In addition, the computational complexity of each model in the ablation experiment was compared. As shown in Table 1, both the number of parameters and the computational load increase when each module is fused. However, significant improvements in model performance justify this increase. The data in Table 2 show that even with improvements in the original module, the number and complexity of model parameters have not changed much, but the performance has improved considerably. This shows that thoughtful modifications and enhancements can lead to better performance without significantly increasing the computational burden. Overall, the results of our comprehensive analysis and ablation experiments underscore the validity of our proposed module. Each module contributes to improving the performance of MSCF-Net, demonstrating the potential for future advances in segmentation accuracy and computational efficiency.

4. Conclusions

In the intricate and often unpredictable maritime environment, ship segmentation has emerged as a crucial task for intelligent maritime surveillance systems. In this study, we introduce a multi-scale context feature network with an attention-guided module specifically designed for ship segmentation in surveillance videos. Our method leverages the power of multi-scale context features to capture varying levels of detail and the attention-guided mechanism to improve the precision of segmentation, even in the presence of complex backgrounds and varying lighting conditions. To prove the efficacy of our approach, we performed comprehensive experiments on the public SeaShipsSeg and MariBoatsSubclass open-source datasets. Through meticulous ablation studies, we confirmed the contribution of each individual component of MSCF-Net. Furthermore, we performed comparative experiments with current state-of-the-art algorithms to highlight the advantages of MSCF-Net. The results showed that MSCF-Net not only surpasses existing algorithms in accuracy but also exhibits superior robustness across various challenging scenarios. In the future, we aim to extend the application of our MSCF-Net to the detection and recognition of ship identification in natural scenes. This work will involve adapting MSCF-Net to not only segment ships but also accurately identify ship names, further enhancing its utility for maritime surveillance and security operations.

Author Contributions

Conceptualization, X.J. (Xiaodan Jiang) and X.D.; methodology, X.J. (Xiaodan Jiang); writing—original draft preparation, X.J. (Xiaodan Jiang) and X.D.; writing—review and editing, X.J. (Xiaodan Jiang) and X.J. (Xiaoliang Jiang). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62102227, Zhejiang Basic Public Welfare Research Project, grant number LTGC23E050001, LGN21C130001, LGF21F010002, Science and Technology Major Projects of Quzhou, grant number 2023K221.

Data Availability Statement

The authors have used publicly available data in this manuscript. The dataset link is mentioned in the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. 2015, 39, 640–651. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Rampriya, R.S.; Nathan, S.; Suganya, R.; Prathiba, S.B.; Perumal, P.S.; Wang, W. Lightweight railroad semantic segmentation network and distance estimation for railroad Unmanned aerial vehicle images. Eng. Appl. Artif. Intel. 2024, 134, 108620. [Google Scholar] [CrossRef]
Rashid, K.I.; Yang, C.H.; Huang, C.X. Fast-DSAGCN: Enhancing semantic segmentation with multifaceted attention mechanisms. Neurocomputing 2024, 587, 127625. [Google Scholar] [CrossRef]
Wu, X.L.; Fang, P.; Liu, X.; Liu, M.H.; Huang, P.C.; Duan, X.H.; Huang, D.K.; Liu, Z.P. AM-UNet: Field ridge segmentation of paddy field images based on an improved MultiResUNet network. Agriculture 2024, 14, 637. [Google Scholar] [CrossRef]
Ma, F.; Kang, Z.; Chen, C.; Sun, J.; Deng, J.Z. MrisNet: Robust ship instance segmentation in challenging marine radar environments. J. Mar. Sci. Eng. 2024, 12, 72. [Google Scholar] [CrossRef]
Sun, Y.X.; Su, L.; Yuan, S.Z.; Meng, H. DANet: Dual-branch activation network for small object instance segmentation of ship images. IEEE Trans. Circ. Syst. Vid. 2023, 33, 6708–6720. [Google Scholar] [CrossRef]
Peng, Z.B.; Wang, L.M.; Tong, L.; Zou, H.; Liu, D.; Zhang, C.Y. Multi-threshold image segmentation of 2D OTSU inland ships based on improved genetic algorithm. PLoS ONE 2023, 18, e0290750. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.Q.; Li, C.F.; Shang, S.P.; Chen, X.Q. SwinSeg: Swin transformer and MLP hybrid network for ship segmentation in maritime surveillance system. Ocean Eng. 2023, 281, 114885. [Google Scholar] [CrossRef]
Sun, Z.Q.; Meng, C.N.; Huang, T.; Zhang, Z.Q.; Chang, S.J. Marine ship instance segmentation by deep neural networks using a global and local attention (GALA) mechanism. PLoS ONE 2023, 18, e0279248. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Su, L.; Luo, Y.; Meng, H.; Li, W.; Zhang, Z.; Wang, P.; Zhang, W. Global Mask R-CNN for marine ship instance segmentation. Neurocomputing 2022, 480, 257–270. [Google Scholar] [CrossRef]
Yuan, M.; Meng, H.; Wu, J. AM YOLO: Adaptive multi-scale YOLO for ship instance segmentation. J. Real-Time Image Pr. 2024, 21, 100. [Google Scholar] [CrossRef]
Zhao, W.H.; Cao, J.N.; Dong, X.Y. U-shaped contourlet network for high-spatial-resolution remote sensing images segmentation. J. Appl. Remote Sens. 2023, 17, 034509. [Google Scholar] [CrossRef]
Li, Z.K.; Liu, Y.F.; Li, B.; Feng, B.L.; Wu, K.B.; Peng, C.W.; Hu, W.M. SDTP: Semantic-aware decoupled transformer pyramid for dense image prediction. IEEE Trans. Circ. Syst. Vid. 2022, 32, 6160–6173. [Google Scholar] [CrossRef]
Wu, L.J.; Qiu, S.D.; Chen, Z.C. Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion. J. Real-Time Image Pr. 2024, 21, 74. [Google Scholar] [CrossRef]
Reddy, B.S.; Sathish, A. A multiscale atrous convolution-based adaptive ResUNet3+ with attention-based ensemble convolution networks for brain tumour segmentation and classification using heuristic improvement. Biomed. Signal Proces. 2024, 91, 105900. [Google Scholar]
Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, G. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 325–341. [Google Scholar]
Luo, H.; Zhou, D.M.; Cheng, Y.J.; Wang, S.Q. MPEDA-Net: A lightweight brain tumor segmentation network using multi-perspective extraction and dense attention. Biomed. Signal Proces. 2024, 91, 106054. [Google Scholar] [CrossRef]
Yuan, H.J.; Chen, L.N.; He, X.F. MMUNet: Morphological feature enhancement network for colon cancer segmentation in pathological images. Biomed. Signal Proces. 2024, 91, 105927. [Google Scholar] [CrossRef]
Shao, Z.; Wu, W.; Wang, Z.; Du, W.; Li, C. Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Trans. Multimed. 2018, 20, 2593–2604. [Google Scholar] [CrossRef]
Selvaraj, A.; Nithiyaraj, E. CEDRNN: A convolutional encoder-decoder residual neural network for liver tumour segmentation. Neural Process. Lett. 2023, 55, 1605–1624. [Google Scholar] [CrossRef]
Nham, D.N.; Trinh, M.N.; Nguyen, V.D.; Pham, V.; Tran, T.T. An effcientNet-encoder U-Net joint residual refinement module with Tversky-Kahneman Baroni-Urbani-Buser loss for biomedical image segmentation. Biomed. Signal Proces. 2023, 83, 104631. [Google Scholar] [CrossRef]
Wu, R.; Liang, P.; Huang, X.; Shi, L.; Gu, Y.; Zhu, H.; Chang, Q. MHorUNet: High-order spatial interaction UNet for skin lesion segmentation. Biomed. Signal Proces. 2024, 88, 105517. [Google Scholar] [CrossRef]
He, J.; Zhang, M.; Li, W.; Peng, Y.; Fu, B.; Liu, C.; Wang, J.; Wang, R. SaB-Net: Self-attention backward network for gastric tumor segmentation in CT images. Comput. Biol. Med. 2024, 169, 107866. [Google Scholar] [CrossRef] [PubMed]
Nag, S.; Makwana, D.; Mittal, S.; Mohan, C.K. WaferSegClassNet-A light-weight network for classification and segmentation of semiconductor wafer defects. Comput. Ind. 2022, 142, 103720. [Google Scholar] [CrossRef]
Li, Y.; Zhang, Y.; Liu, J.Y.; Wang, K.; Zhang, K.; Zhang, G.S.; Liao, X.F.; Yang, G. Global transformer and dual local attention network via deep-shallow hierarchical feature fusion for retinal vessel segmentation. IEEE Trans. Cybern. 2023, 53, 5826–5839. [Google Scholar] [CrossRef]
Yang, C.; Li, B.; Xiao, Q.; Bai, Y.; Li, Y.; Li, Z.; Li, H.; Li, H. LA-Net: Layer attention network for 3D-to-2D retinal vessel segmentation in OCTA images. Phys. Med. Biol. 2024, 69, 045019. [Google Scholar] [CrossRef] [PubMed]
Huang, Z.; Xie, F.; Qing, W.; Wang, M.; Liu, M.; Sun, D. MGF-net: Multi-channel group fusion enhancing boundary attention for polyp segmentation. Med. Phys. 2024, 51, 407–418. [Google Scholar] [CrossRef]
Ji, M.M.; Wu, Z.B. Automatic detection and severity analysis of grape black measles disease based on deep learning and fuzzy logic. Comput. Electron. Agr. 2022, 193, 106718. [Google Scholar] [CrossRef]
Cheng, J.; Wu, Z.; Wang, S.; Demonceaux, C.; Jiang, Q. Bidirectional collaborative mentoring network for marine organism detection and beyond. IEEE Trans. Circ. Syst. Vid. 2023, 33, 6595–6608. [Google Scholar] [CrossRef]
Lin, J.; Jing, W.; Song, H.; Chen, G. ESFNet: Efficient network for building extraction from high-resolution aerial images. IEEE Access 2019, 7, 54285–54294. [Google Scholar] [CrossRef]
Zhang, X.; Chen, Z.; Wu, Q.J.; Cai, L.; Lu, D.; Li, X. Fast semantic segmentation for scene perception. IEEE Trans. Ind. Inform. 2018, 15, 1183–1192. [Google Scholar] [CrossRef]
Lu, F.; Zhang, Z.; Guo, L.; Chen, J.; Zhu, Y.; Yan, K.; Zhou, X. HFENet: A lightweight hand-crafted feature enhanced CNN for ceramic tile surface defect detection. Int. J. Intell. Syst. 2022, 37, 10670–10693. [Google Scholar] [CrossRef]
Shi, M.; Shen, J.; Yi, Q.; Weng, J.; Huang, Z.; Luo, A.; Zhou, Y. LMFFNet: A well-balanced lightweight network for fast and accurate semantic segmentation. IEEE Trans. Neural Netw. Learn. 2022, 34, 3205–3219. [Google Scholar] [CrossRef]
Wang, K.; Zhang, X.; Lu, Y.; Zhang, X.; Zhang, W. CGRNet: Contour-guided graph reasoning network for ambiguous biomedical image segmentation. Biomed. Signal Proces. 2022, 75, 103621. [Google Scholar] [CrossRef]
Li, R.; Zheng, S.; Duan, C.; Wang, L.; Zhang, C. Land cover classification from remote sensing images based on multi-scale fully convolutional network. Geo-Spat. Inf. Sci. 2022, 25, 278–294. [Google Scholar] [CrossRef]
Shu, X.; Wang, J.; Zhang, A.; Shi, J.; Wu, X.J. CSCA U-Net: A channel and space compound attention CNN for medical image segmentation. Artifi. Intell. Med. 2024, 150, 102800. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Li, Z.; Huang, X.; Peng, Z.; Deng, Y.; Tang, L.; Yin, L. SCSONet: Spatial-channel synergistic optimization net for skin lesion segmentation. Front. Phys. 2024, 12, 1388364. [Google Scholar] [CrossRef]
Ding, L.; Tang, H.; Bruzzone, L. LANet: Local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 426–435. [Google Scholar] [CrossRef]
Chung, W.Y.; Lee, I.H.; Park, C.G. Lightweight infrared small target detection network using full-scale skip connection U-Net. IEEE Geosci. Remote Sens. 2023, 20, 7000705. [Google Scholar] [CrossRef]
Liu, Y.; Li, H.; Cheng, J.; Chen, X. MSCAF-net: A general framework for camouflaged object detection via learning multi-scale context-aware features. IEEE Trans. Circ. Syst. Vid. 2023, 33, 4934–4947. [Google Scholar] [CrossRef]
Li, G.; Liu, Z.; Lin, W.; Ling, H. Multi-content complementation network for salient object detection in optical remote sensing images. IEEE Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]

Figure 1. Network architecture of our MSCF-Net.

Figure 2. Network architecture of multi-scale context feature module.

Figure 3. Network architecture of the proposed spatial pyramid pooling module.

Figure 4. Network architecture of attention module with multi-scale structure.

Figure 5. Network architecture of attention refinement module.

Figure 6. The visual illustration of the ships in SeaShipsSeg and corresponding annotations.

Figure 7. The visual illustration of the ships in MariBoatsSubclass and corresponding annotations.

Figure 8. Loss and accuracy curves of MSCF-Net on training and verification. The first row: results on SeaShipsSeg dataset. The second row: results on MariBoatsSubclass dataset.

Figure 9. Visual comparison on the SeaShipsSeg dataset. The first and second rows are the original images and their corresponding annotations. The third and last rows are the segmentation results of U-Net, MHorUNet, BCMNet, ESFNet, FSSNet, HFENet, LMFFNet, CGRNet, MSFCN, CSCAUNet, SCSONet, LANet, AMFU_Net, MSCSFNet, MCCNet_VGG, and MSCF-Net.

Figure 10. Visual comparison on the MariBoatsSubclass dataset. The first and second rows are the original images and their corresponding annotations. The third and last rows are the segmentation results of U-Net, MHorUNet, BCMNet, ESFNet, FSSNet, HFENet, LMFFNet, CGRNet, MSFCN, CSCAUNet, SCSONet, LANet, AMFU_Net, MSCSFNet, MCCNet_VGG, and MSCF-Net.

Table 1. Ablation experiments of different modules on SeaShipsSeg dataset.

Case	Dice	Recall	Mcc	Jaccard	Params	FPS
Baseline (U-Net)	0.9210	0.8742	0.9201	0.8568	1.94	256.63
Baseline + MCFM	0.9402	0.9355	0.9385	0.8905	17.01	134.23
Baseline + SPPM	0.9388	0.9242	0.9368	0.8879	1.99	190.73
Baseline + AMMS	0.9383	0.9099	0.9366	0.8856	2.93	104.80
Baseline + MCFM + SPPM	0.9484	0.9422	0.9463	0.9037	8.53	130.86
Baseline + MCFM + AMMS	0.9420	0.9358	0.9405	0.8939	18.00	75.96
Baseline + SPPM + AMMS	0.9411	0.9333	0.9398	0.8916	2.54	98.52
Baseline + MCFM + SPPM + AMMS	0.9502	0.9498	0.9481	0.9066	9.51	76.54

Table 2. Comparative experiments of traditional structure and corresponding improvement on SeaShipsSeg dataset.

Case	Dice	Recall	Mcc	Jaccard	Params	FPS
U-Net + Traditional convolution in Figure 2a	0.9210	0.8742	0.9201	0.8568	1.94	256.63
U-Net + MCFM in Figure 2b	0.9402	0.9355	0.9385	0.8905	17.01	134.23
U-Net + Traditional SPPM in Figure 3a	0.9254	0.8833	0.9241	0.8637	1.55	223.83
U-Net + The proposed SPPM in Figure 3b	0.9388	0.9242	0.9368	0.8879	1.99	190.73
U-Net + AMMS without ARM	0.9303	0.8942	0.9285	0.8719	2.86	160.96
U-Net + AMMS	0.9383	0.9099	0.9366	0.8856	2.93	104.80

Table 3. Quantitative comparison on the SeaShipsSeg dataset.

Method	Dice	Recall	Mcc	Jaccard
U-Net [2]	0.9210	0.8742	0.9201	0.8568
MHorUNet [23]	0.9079	0.8949	0.9044	0.8359
BCMNet [30]	0.9324	0.9297	0.9296	0.8756
ESFNet [31]	0.8929	0.8835	0.8885	0.8114
FSSNet [32]	0.9312	0.9228	0.9284	0.8730
HFENet [33]	0.9258	0.9107	0.9231	0.8646
LMFFNet [34]	0.9350	0.9316	0.9323	0.8800
CGRNet [35]	0.9359	0.9252	0.9333	0.8816
MSFCN [36]	0.9441	0.9323	0.9420	0.8960
CSCAUNet [37]	0.9428	0.9416	0.9406	0.8942
SCSONet [38]	0.9076	0.8980	0.9039	0.8347
LANet [39]	0.9308	0.9196	0.9280	0.8726
AMFU_Net [40]	0.9381	0.9273	0.9357	0.8856
MSCSFNet [41]	0.9239	0.9183	0.9208	0.8620
MCCNet_VGG [42]	0.9498	0.9393	0.9480	0.9062
MSCF-Net	0.9502	0.9498	0.9481	0.9066

Table 4. Quantitative comparison on the MariBoatsSubclass dataset.

Method	Dice	Recall	Mcc	Jaccard
U-Net [2]	0.8426	0.7828	0.8046	0.7296
MHorUNet [23]	0.8549	0.8690	0.8112	0.7480
BCMNet [30]	0.8706	0.8820	0.8317	0.7721
ESFNet [31]	0.8467	0.8493	0.8012	0.7351
FSSNet [32]	0.8766	0.8833	0.8397	0.7817
HFENet [33]	0.8540	0.8583	0.8105	0.7468
LMFFNet [34]	0.8875	0.9004	0.8536	0.7990
CGRNet [35]	0.8801	0.8922	0.8440	0.7869
MSFCN [36]	0.8538	0.8728	0.8096	0.7462
CSCAUNet [37]	0.8857	0.8889	0.8520	0.7963
SCSONet [38]	0.8490	0.8816	0.8029	0.7390
LANet [39]	0.8680	0.8798	0.8287	0.7683
AMFU_Net [40]	0.8666	0.8867	0.8264	0.7655
MSCSFNet [41]	0.8445	0.8762	0.7976	0.7331
MCCNet_VGG [42]	0.8904	0.9007	0.8573	0.8035
MSCF-Net	0.8935	0.9019	0.8615	0.8085

Table 5. Complexity comparison with other models.

Method	Params (M)	FPS
U-Net [2]	1.94	250.82
MHorUNet [23]	3.49	12.08
BCMNet [30]	32.04	24.78
ESFNet [31]	0.09	158.64
FSSNet [32]	0.17	117.23
HFENet [33]	0.16	246.43
LMFFNet [34]	1.34	86.38
CGRNet [35]	24.57	131.78
MSFCN [36]	14.17	138.84
CSCAUNet [37]	35.27	43.88
SCSONet [38]	0.16	57.98
LANet [39]	23.79	124.65
AMFU_Net [40]	0.47	59.49
MSCSFNet [41]	29.70	21.88
MCCNet_VGG [42]	67.65	52.36
MSCF-Net	9.51	69.76

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, X.; Ding, X.; Jiang, X. MSCF-Net: Attention-Guided Multi-Scale Context Feature Network for Ship Segmentation in Surveillance Videos. Mathematics 2024, 12, 2566. https://doi.org/10.3390/math12162566

AMA Style

Jiang X, Ding X, Jiang X. MSCF-Net: Attention-Guided Multi-Scale Context Feature Network for Ship Segmentation in Surveillance Videos. Mathematics. 2024; 12(16):2566. https://doi.org/10.3390/math12162566

Chicago/Turabian Style

Jiang, Xiaodan, Xiajun Ding, and Xiaoliang Jiang. 2024. "MSCF-Net: Attention-Guided Multi-Scale Context Feature Network for Ship Segmentation in Surveillance Videos" Mathematics 12, no. 16: 2566. https://doi.org/10.3390/math12162566

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MSCF-Net: Attention-Guided Multi-Scale Context Feature Network for Ship Segmentation in Surveillance Videos

Abstract

1. Introduction

2. Materials and Methods

2.1. Network Architecture of Our MSCF-Net

2.2. Architecture of Multi-Scale Context Feature Module

2.3. Architecture of Spatial Pyramid Pooling Module

2.4. Architecture of Attention Module with Multi-Scale Structure

2.5. Loss Function

3. Experiments

3.1. Dataset Description

3.1.1. SeaShipsSeg

3.1.2. MariBoatsSubclass

3.2. Evaluation Metrics

3.3. Implementation Details

3.4. Ablation Experiments

3.5. Comparison with the State-of-the-Art Algorithms

3.5.1. Results on SeaShipsSeg

3.5.2. Results on MariBoatsSubclass

3.5.3. Computational Complexity

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI