1. Introduction
Machine vision technology has advanced as a result of the quick growth of computers and related cutting-edge technologies. This field, which integrates image processing and sensors with computers as tools, has accelerated advancements in areas such as semantic segmentation, target detection and tracking, and super-resolution image reconstruction, among other research directions. Due to its broad application value and significant research importance, infrared target detection has garnered considerable attention in the field of machine vision. It has been extensively applied in both civil and military domains, such as in ground target detection [
1] and infrared guidance systems [
2], with infrared ship target detection being particularly valuable for sea rescue and marine safety.
Current target detecting systems for ships mainly relies on visible light, synthetic aperture radar (SAR), hyperspectral, and infrared imaging technologies [
3]. While visible light has benefits like high resolution, rich texture details, and a broad sensing range, it is challenging to precisely identify targets in low light reflection environments, such as bad weather. SAR is capable of long-distance and all-weather detection, but it is susceptible to electromagnetic interference, islands, harbors, and atmospheric influences during detection on the sea surface. Unlike visible light imaging, infrared imaging has smoke penetration and anti-interference abilities and is not subject to light conditions; infrared imaging passively receives radiation, making it more concealed than SAR [
4]. Therefore, the infrared imaging-based surface ship target detection method has versatility in complex sea conditions. However, due to the characteristics of infrared imaging, its resolution is much lower than that of visible light images, leading to the absence of crucial data throughout the weak target (The target is considered a weak target if the contrast is less than 15%, the signal-to-noise ratio is less than 1.5, and the imaging size is less than 80 pixels, specifically 0.15% of a 256 × 256 pixel target.) In Reference [
5], a ship feature extraction process is shown to affect the detection outcome. Additionally, a large number of ships docked in coastal harbors obscure one another, posing a significant challenge to achieving the accurate recognition and classification of infrared ship targets.
To summarize, weak targets are key factors constraining the detection performance of infrared imaging systems [
6]. Since 2012, when the area of computer vision was rocked by the deep learning wave [
7], more and more advanced detection algorithms have brought new ideas to weak target detection and spawned numerous detection algorithm frameworks [
8]. The two primary categories of general-purpose target detection methods are the single-stage detector, exemplified by YOLO [
9], which is distinguished by its quick detection speed and lightweight design, and the two-stage detector, dominated by the Faster-RCNN [
10] series of algorithms, which has an advantage of multi-scale detection. There are also Transformer-based detection algorithms, such as the DETR [
11] series. These algorithms focus on image global relationship modeling, but their ability to perceive local details is insufficient, leading to poor performance in small target detection. Thus, current target detection research efforts are focused on enhancing the performance of weak and small target identification algorithms.
To overcome this challenge, researchers have proposed several methods to increase the detection effectiveness of small and weak objects. An attention-guided bidirectional feature pyramid network, created by Yi et al. [
12], enhances the ability to recognize tiny objects in remote sensing images. An enhanced anchor-free method was presented by Sun et al. [
13], with the goal of improving the efficiency of small target ship recognition in high-resolution synthetic aperture radar (HRSAR) images. Xie et al. [
14] employed a module for coordinating attention to aggregate positional information, thereby enhancing the capacity to recognize small objects in distant sensing photos with intricate distributions. Yu et al. [
15] designed a visually salient lightweight ship detector (VS-LSDet) to identify small ship targets in SAR images. Additionally, Sun et al. [
16] proposed a YOLO-based SAR ship detection algorithm utilizing the classification of angles and bidirectional feature fusion. Li et al. [
17] proposed a cross-layer attention network to obtain stronger features for visible light weak targets. In recent years, weak target detection for infrared images has also emerged. Dai et al. [
18] proposed an asymmetrical setting modulation to modulate both high-level and underlying semantic features. Ye et al. [
19] designed a mixed attention mechanism to improve feature fusion and created a feature fusion technique to gather remote contextual information about small objects. In order to enhance feature representation capability, Zhang et al. [
20] created a spatial and frequency attention-based decoder to extract spatial context and frequency domain context. Si et al. [
21] designed an improved bidirectional feature fusion pyramid network structure to attain multi-scale weighted feature fusion across layers and increase the detection rate of remote sensing ships. Guo et al. [
22] designed a bidirectional attention-based feature pyramid network (BAFPN) for offshore vessel detection, enhancing the detection performance of tiny vessels. Wang et al. [
23] successfully improved the accuracy of weak target detection by introducing SPD-Conv and proposing a feature fusion network with a fusion attention mechanism. Zhang et al. [
24] designed a feature enhancement, fusion, and context-aware detector to increase the likelihood that small targets in remote sensing photos will be detected. Gong et al. [
25] proposed an ASDet detector, which optimizes the loss function to mitigate the discontinuous boundary issue caused by angular periodicity, thereby improving the detection accuracy of small objects. Yuan et al. [
26] introduced a two-stage small object detection framework based on a Coarse-to-Fine pipeline and feature imitation learning, which enhances the detection accuracy of small objects.
Recent research has identified several factors contributing to the low efficiency of infrared target detection, such as the following: (1) Small targets occupy a small proportion of pixels in the entire image and carry fewer target features. During the downsampling process of feature extraction, small target information is often lost, leading to low detection accuracy for small targets. (2) Convolutional neural network (CNN)-based object detection algorithms tend to lose contextual information during feature extraction, further decreasing the detection accuracy for small targets. (3) Interference from background information. To address the challenges of infrared ship target detection under complex sea–sky backgrounds and the weak small target extraction capability that results in low detection accuracy, this paper proposes a novel infrared ship target detection algorithm, YOLO-IRS. First, to tackle the challenge of weak small target detection ability, we introduce the Swin Transformer to extract features from infrared ship images. By employing a shifted window multi-head self-attention mechanism, the window’s receptive field is expanded, enhancing the model’s ability to focus on global features during the feature extraction process and thereby improving small target detection performance. Secondly, we design the C3KAN module, which improves the Bottleneck module in the neck of the baseline model’s C2f. In this module, the Conv layer is replaced by the KAN network, which better adapts to multi-scale variations in infrared target detection. This not only improves detection accuracy but also addresses the issues of false positives and missed detections in complex backgrounds and under dense occlusion conditions.
The sea surface infrared ship target is characterized by low contrast, a low signal-to-noise ratio, and so on. The shape and texture characteristics are not obvious, making it easy for the target to be overwhelmed by background clutter noise and missed in detection. Additionally, the sea surface is full of islands, waves, cumulus clouds, and other interferences similar in brightness amplitude to the ship, which can easily lead to false detection. Furthermore, near the ports in coastal areas, the high density of ships increases the likelihood of occlusion, resulting in missed detection. Due to these characteristics of infrared ship detection, traditional methods struggle with poor anti-interference capabilities and insufficient generalization, while existing detection algorithms based on deep learning lack specificity, making it difficult to achieve the accurate identification, classification, and localization of infrared ship targets.
By contrasting several network models, this research proposes an infrared ship target detection method, YOLO-IRS, with the aforementioned obstacles and difficulties in mind. The approach effectively solves the issue of target loss due to occlusion in high-density scenarios by exhibiting strong anti-interference capabilities and outstanding performance in small target detection. Through experimental comparisons, the efficacy of the approach presented in this study is confirmed: precision is increased by 1.3%, mean average precision (mAP50) is increased by 0.5%, and mAP50–95 is increased by 1.7%. The primary contributions of this article are as follows:
To capture target context information and enhance the recognition accuracy of small targets, this paper introduces the Swin Transformer module, which can expand the window field of view through a shifted-window multi-head self-attention mechanism, thereby enhancing the ability to focus on global features during the feature extraction process.
Addressing the problem of false detection and missed detection in complex backgrounds and dense occlusion, this paper designs the C3KAN module, which effectively improves the recognition accuracy of infrared ship targets.
A large number of experiments have been conducted on the public infrared ship dataset. The effects of the Swin Transformer module and the C3KAN module on the experimental results were tested through ablation experiments. Comparative experiments verify the anti-interference capability of the proposed YOLO-IRS algorithm and its effectiveness in detecting infrared ships in complex backgrounds.
5. Conclusions
Aiming at the problems of the low detection rates of infrared ship targets in complex ocean backgrounds and dense scenes and the inefficient feature extraction of weak ship targets in blurred scenes, which leads to insufficient detection capability, this paper proposes an algorithmic network, YOLO-IRS, using YOLOv10 as the algorithmic baseline. First, the paper introduces the Swin Transformer module, which solves the problem of missed target detection caused by the dense occlusion of infrared ships near coastal ports by extracting global image features. Second, the paper designs the C3KAN module, which achieves better results in the feature extraction of weak and small infrared ship targets, enhances the feature fusion of multi-scale ship targets, effectively detects infrared ship targets of different scales in the same scene, and improves the model’s detection performance for small targets. Compared with the original algorithm, YOLO-IRS improves the detection accuracy for all target classes, with detection accuracy (P) improved by 1.3%, mAP50 by 0.5%, and mAP50–95 by 1.7%. This paper also compares YOLO-IRS with other mainstream detection algorithms, and the results demonstrate the superiority of the algorithm proposed in this paper. The experimental results show that YOLO-IRS can effectively avoid the problems of misdetection and omission when dealing with small targets, dense scenes, blurred backgrounds, and complex backgrounds, thereby improving the accuracy of infrared ship detection. However, infrared ship targets in distant sea scenes become even weaker, making it challenging for the program to recognize them accurately, and future research will concentrate on resolving these problems.