MDPI - Publisher of Open Access Journals

25 pages, 4296 KB

Open AccessArticle

StripSurface-YOLO: An Enhanced Yolov8n-Based Framework for Detecting Surface Defects on Strip Steel in Industrial Environments

by Haomin Li, Huanzun Zhang and Wenke Zang

Electronics 2025, 14(15), 2994; https://doi.org/10.3390/electronics14152994 - 27 Jul 2025

Cited by 1 | Viewed by 616

Abstract

Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in [...] Read more.

Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in complex industrial environments, this study proposes StripSurface–YOLO, a novel real-time defect detection framework built upon YOLOv8n. The core architecture integrates an Efficient Cross-Stage Local Perception module (ResGSCSP), which synergistically combines GSConv lightweight convolutions with a one-shot aggregation strategy, thereby markedly reducing both model parameters and computational complexity. To further enhance multi-scale feature representation, this study introduces an Efficient Multi-Scale Attention (EMA) mechanism at the feature-fusion stage, enabling the network to more effectively attend to critical defect regions. Moreover, conventional nearest-neighbor upsampling is replaced by DySample, which produces deeper, high-resolution feature maps enriched with semantic content, improving both inference speed and fusion quality. To heighten sensitivity to small-scale and low-contrast defects, the model adopts Focal Loss, dynamically adjusting to sample difficulty. Extensive evaluations on the NEU-DET dataset demonstrate that StripSurface–YOLO reduces FLOPs by 11.6% and parameter count by 7.4% relative to the baseline YOLOv8n, while achieving respective improvements of 1.4%, 3.1%, 4.1%, and 3.0% in precision, recall, mAP₅₀, and mAP_50:95. Under adverse conditions—including contrast variations, brightness fluctuations, and Gaussian noise—SteelSurface-YOLO outperforms the baseline model, delivering improvements of 5.0% in mAP₅₀ and 4.7% in mAP_50:95, attesting to the model’s robust interference resistance. These findings underscore the potential of StripSurface–YOLO to meet the rigorous performance demands of real-time surface defect detection in the metal forging industry. Full article

► Show Figures

Figure 1

24 pages, 3716 KB

Open AccessArticle

HRRPGraphNet++: Dynamic Graph Neural Network with Meta-Learning for Few-Shot HRRP Radar Target Recognition

by Lingfeng Chen, Zhiliang Pan, Qi Liu and Panhe Hu

Remote Sens. 2025, 17(12), 2108; https://doi.org/10.3390/rs17122108 - 19 Jun 2025

Cited by 1 | Viewed by 844

Abstract

High-Resolution Range Profile (HRRP) radar recognition suffers from data scarcity challenges in real-world applications. We present HRRPGraphNet++, a framework combining dynamic graph neural networks with meta-learning for few-shot HRRP recognition. Our approach generates graph representations dynamically through multi-head self attention (MSA) mechanisms that [...] Read more.

High-Resolution Range Profile (HRRP) radar recognition suffers from data scarcity challenges in real-world applications. We present HRRPGraphNet++, a framework combining dynamic graph neural networks with meta-learning for few-shot HRRP recognition. Our approach generates graph representations dynamically through multi-head self attention (MSA) mechanisms that adapt to target-specific scattering characteristics, integrated with a specialized meta-learning framework employing layer-wise learning rates. Experiments demonstrate state-of-the-art performance in 1-shot (82.3%), 5-shot (91.8%), and 20-shot (94.7%) settings, with enhanced noise robustness (68.7% accuracy at 0 dB SNR). Our hybrid graph mechanism combines physical priors with learned relationships, significantly outperforming conventional methods in challenging scenarios. Full article

(This article belongs to the Special Issue Advanced AI Technology for Remote Sensing Analysis)

► Show Figures

Graphical abstract

18 pages, 974 KB

Open AccessArticle

A Principal Component Analysis-Based Feature Optimization Network for Few-Shot Fine-Grained Image Classification

by Meijia Wang, Boyuan Zheng, Guochao Wang, Junpo Yang, Jin Lu and Weichuan Zhang

Mathematics 2025, 13(7), 1098; https://doi.org/10.3390/math13071098 - 27 Mar 2025

Viewed by 1036

Abstract

Feature map reconstruction networks (FRN) have demonstrated significant potential by leveraging feature reconstruction. However, the typical process of FRN gives rise to two notable issues. First, FRN exhibits high sensitivity to noise, particularly ambient noise, which can lead to substantial reconstruction errors and [...] Read more.

Feature map reconstruction networks (FRN) have demonstrated significant potential by leveraging feature reconstruction. However, the typical process of FRN gives rise to two notable issues. First, FRN exhibits high sensitivity to noise, particularly ambient noise, which can lead to substantial reconstruction errors and hinder the network’s ability to extract meaningful features. Second, FRN is particularly vulnerable to changes in data distribution. Owing to the fine-grained nature of the training data, the model is highly susceptible to overfitting, which may compromise its ability to extract effective feature representations when confronted with new classes. To address these challenges, this paper proposes a novel main feature selection module (MFSM), which suppresses feature noise interference and enhances the discriminative capacity of feature representations through principal component analysis (PCA). Extensive experiments validate the effectiveness of MFSM, revealing substantial improvements in classification accuracy for few-shot fine-grained image classification (FSFGIC) tasks. Full article

(This article belongs to the Special Issue New Trends in Computer Vision, Deep Learning and Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 3887 KB

Open AccessArticle

The Impact of Linguistic Variations on Emotion Detection: A Study of Regionally Specific Synthetic Datasets

by Fernando Henrique Calderón Alvarado

Appl. Sci. 2025, 15(7), 3490; https://doi.org/10.3390/app15073490 - 22 Mar 2025

Viewed by 803

Abstract

This study examines the role of linguistic regional variations in synthetic dataset generation and their impact on emotion detection performance. Emotion detection is essential for natural language processing (NLP) applications such as social media analysis, customer service, and mental health monitoring. To explore [...] Read more.

This study examines the role of linguistic regional variations in synthetic dataset generation and their impact on emotion detection performance. Emotion detection is essential for natural language processing (NLP) applications such as social media analysis, customer service, and mental health monitoring. To explore this, synthetic datasets were generated using a state-of-the-art language model, incorporating English variations from the United States, United Kingdom, and India, alongside a general baseline dataset. Two levels of prompt specificity were employed to assess the influence of regional linguistic nuances. Statistical analyses—including frequency distribution, term frequency-inverse document frequency (TF-IDF), type–token ratio (TTR), hapax legomena, pointwise mutual information (PMI) scores, and key-phrase extraction—revealed significant linguistic diversity and regional distinctions in the generated datasets. To evaluate their effectiveness, classification experiments were conducted with two models using bidirectional encoder representations from transformers (BERT) and its de-noising sequence to sequence variation (BART), beginning with zero-shot classification on the contextualized affect representations for emotion recognition (CARER) dataset, followed by fine-tuning with both baseline and region-specific datasets. Results demonstrated that region-specific datasets, particularly those generated with detailed prompts, significantly improved classification accuracy compared to the baseline. These findings underscore the importance of incorporating global linguistic variations in synthetic dataset generation, offering insights into how regional adaptations can enhance emotion detection models for diverse NLP applications. Full article

(This article belongs to the Special Issue Application of Affective Computing)

► Show Figures

Figure 1

15 pages, 276 KB

Open AccessFeature PaperArticle

by Michael Grabchak

Mathematics 2025, 13(6), 907; https://doi.org/10.3390/math13060907 - 8 Mar 2025

Viewed by 823

Abstract

A common approach to simulating a Lévy process is to truncate its shot-noise representation. We focus on subordinators and introduce the remainder process, which represents the jumps that are removed by the truncation. We characterize when these processes are self-similar and show that, [...] Read more.

A common approach to simulating a Lévy process is to truncate its shot-noise representation. We focus on subordinators and introduce the remainder process, which represents the jumps that are removed by the truncation. We characterize when these processes are self-similar and show that, in the self-similar case, they can be indexed by a parameter

α \in (- \infty, 1)

. When

α \in (0, 1)

, they correspond to

α

-stable distributions, and when

α = 0

, they correspond to certain generalizations of the Dickman distribution. Thus, the Dickman distribution plays the role of a 0-stable distribution in this context. Full article

(This article belongs to the Section D1: Probability and Statistics)

22 pages, 5498 KB

Open AccessArticle

Small-Sample Target Detection Across Domains Based on Supervision and Distillation

by Fusheng Sun, Jianli Jia, Xie Han, Liqun Kuang and Huiyan Han

Electronics 2024, 13(24), 4975; https://doi.org/10.3390/electronics13244975 - 18 Dec 2024

Cited by 1 | Viewed by 1000

Abstract

To address the issues of significant object discrepancies, low similarity, and image noise interference between source and target domains in object detection, we propose a supervised learning approach combined with knowledge distillation. Initially, student and teacher models are jointly trained through supervised and [...] Read more.

To address the issues of significant object discrepancies, low similarity, and image noise interference between source and target domains in object detection, we propose a supervised learning approach combined with knowledge distillation. Initially, student and teacher models are jointly trained through supervised and distillation-based approaches, iteratively refining the inter-model weights to mitigate the issue of model overfitting. Secondly, a combined convolutional module is integrated into the feature extraction network of the student model, to minimize redundant computational effort; an explicit visual center module is embedded within the feature pyramid network, to bolster feature representation; and a spatial grouping enhancement module is incorporated into the region proposal network, to mitigate the adverse effects of noise on the outcomes. Ultimately, the model undergoes a comprehensive optimization process that leverages the loss functions originating from both the supervised and knowledge distillation phases. The experimental results demonstrate that this strategy significantly boosts classification and identification accuracy on cross-domain datasets; when compared to the TFA (Task-agnostic Fine-tuning and Adapter), CD-FSOD (Cross-Domain Few-Shot Object Detection) and DeFRCN (Decoupled Faster R-CNN for Few-Shot Object Detection), with sample orders of magnitude 1 and 5, increased the detection accuracy by 1.67% and 1.87%, respectively. Full article

(This article belongs to the Topic Visual Computing and Understanding: New Developments and Trends)

► Show Figures

Figure 1

21 pages, 1178 KB

Open AccessArticle

CLG: Contrastive Label Generation with Knowledge for Few-Shot Learning

by Han Ma, Baoyu Fan, Benjamin K. Ng and Chan-Tong Lam

Mathematics 2024, 12(3), 472; https://doi.org/10.3390/math12030472 - 1 Feb 2024

Cited by 1 | Viewed by 1526

Abstract

Training large-scale models needs big data. However, the few-shot problem is difficult to resolve due to inadequate training data. It is valuable to use only a few training samples to perform the task, such as using big data for application scenarios due to [...] Read more.

Training large-scale models needs big data. However, the few-shot problem is difficult to resolve due to inadequate training data. It is valuable to use only a few training samples to perform the task, such as using big data for application scenarios due to cost and resource problems. So, to tackle this problem, we present a simple and efficient method, contrastive label generation with knowledge for few-shot learning (CLG). Specifically, we: (1) Propose contrastive label generation to align the label with data input and enhance feature representations; (2) Propose a label knowledge filter to avoid noise during injection of the explicit knowledge into the data and label; (3) Employ label logits mask to simplify the task; (4) Employ multi-task fusion loss to learn different perspectives from the training set. The experiments demonstrate that CLG achieves an accuracy of 59.237%, which is more than about 3% in comparison with the best baseline. It shows that CLG obtains better features and gives the model more information about the input sentences to improve the classification ability. Full article

(This article belongs to the Special Issue Hybrid Data Processing by Combining Machine Learning, Expert, Safety and Security)

► Show Figures

Figure 1

32 pages, 1285 KB

Open AccessArticle

Comparative Analysis of NLP-Based Models for Company Classification

by Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky, Igor Mishkovski and Dimitar Trajanov

Information 2024, 15(2), 77; https://doi.org/10.3390/info15020077 - 31 Jan 2024

Cited by 4 | Viewed by 7128

Abstract

The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow, costly, and vendor-specific assignments. Therefore, we investigate recent natural [...] Read more.

The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow, costly, and vendor-specific assignments. Therefore, we investigate recent natural language processing (NLP) advancements to automate the company classification process. In particular, we employ and evaluate various NLP-based models, including zero-shot learning, One-vs-Rest classification, multi-class classifiers, and ChatGPT-aided classification. We conduct a comprehensive comparison among these models to assess their effectiveness in the company classification task. The evaluation uses the Wharton Research Data Services (WRDS) dataset, consisting of textual descriptions of publicly traded companies. Our findings reveal that the RoBERTa and One-vs-Rest classifiers surpass the other methods, achieving F1 scores of 0.81 and 0.80 on the WRDS dataset, respectively. These results demonstrate that deep learning algorithms offer the potential to automate, standardize, and continuously update classification systems in an efficient and cost-effective way. In addition, we introduce several improvements to the multi-class classification techniques: (1) in the zero-shot methodology, we use TF-IDF to enhance sector representation, yielding improved accuracy in comparison to standard zero-shot classifiers; (2) next, we use ChatGPT for dataset generation, revealing potential in scenarios where datasets of company descriptions are lacking; and (3) we also employ K-Fold to reduce noise in the WRDS dataset, followed by conducting experiments to assess the impact of noise reduction on the company classification results. Full article

(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)

► Show Figures

Figure 1

22 pages, 4268 KB

Open AccessArticle

Multi-Head Self-Attention-Enhanced Prototype Network with Contrastive–Center Loss for Few-Shot Relation Extraction

by Jiangtao Ma, Jia Cheng, Yonggang Chen, Kunlin Li, Fan Zhang and Zhanlei Shang

Appl. Sci. 2024, 14(1), 103; https://doi.org/10.3390/app14010103 - 21 Dec 2023

Cited by 3 | Viewed by 2554

Abstract

Few-shot relation extraction (FSRE) constitutes a critical task in natural language processing (NLP), involving learning relationship characteristics from limited instances to enable the accurate classification of new relations. The existing research primarily concentrates on using prototype networks for FSRE and enhancing their performance [...] Read more.

Few-shot relation extraction (FSRE) constitutes a critical task in natural language processing (NLP), involving learning relationship characteristics from limited instances to enable the accurate classification of new relations. The existing research primarily concentrates on using prototype networks for FSRE and enhancing their performance by incorporating external knowledge. However, these methods disregard the potential interactions among different prototype networks, and each prototype network can only learn and infer from its limited instances, which may limit the robustness and reliability of the prototype representations. To tackle the concerns outlined above, this paper introduces a novel prototype network called SACT (multi-head self-attention and contrastive-center loss), aimed at obtaining more comprehensive and precise interaction information from other prototype networks to bolster the reliability of the prototype network. Firstly, SACT employs a multi-head self-attention mechanism for capturing interaction information among different prototypes from traditional prototype networks, reducing the noise introduced by unknown categories with a small sample through information aggregation. Furthermore, SACT introduces a new loss function, the contrastive–center loss function, aimed at tightly clustering samples from a similar relationship category in the center of the feature space while dispersing samples from different relationship categories. Through extensive experiments on FSRE datasets, this paper demonstrates the outstanding performance of SACT, providing strong evidence for the effectiveness and practicality of SACT. Full article

(This article belongs to the Special Issue Knowledge Graphs: State-of-the-Art and Applications)

► Show Figures

Figure 1

24 pages, 696 KB

Open AccessArticle

Radar Active Jamming Recognition under Open World Setting

by Yupei Zhang, Zhijin Zhao and Yi Bu

Remote Sens. 2023, 15(16), 4107; https://doi.org/10.3390/rs15164107 - 21 Aug 2023

Cited by 7 | Viewed by 2541

Abstract

To address the issue that conventional methods cannot recognize unknown patterns of radar jamming, this study adopts the idea of zero-shot learning (ZSL) and proposes an open world recognition method, RCAE-OWR, based on residual convolutional autoencoders, which can implement the classification of known [...] Read more.

To address the issue that conventional methods cannot recognize unknown patterns of radar jamming, this study adopts the idea of zero-shot learning (ZSL) and proposes an open world recognition method, RCAE-OWR, based on residual convolutional autoencoders, which can implement the classification of known and unknown patterns. In the supervised training phase, a residual convolutional autoencoder network structure is first constructed to extract the semantic information from a training set consisting solely of known jamming patterns. By incorporating center loss and reconstruction loss into the softmax loss function, a joint loss function is constructed to minimize the intra-class distance and maximize the inter-class distance in the jamming features. Moving to the unsupervised classification phase, a test set containing both known and unknown patterns is fed into the trained encoder, and a distance-based recognition method is utilized to classify the jamming signals. The results demonstrate that the proposed model not only achieves sufficient learning and representation of known jamming patterns but also effectively identifies and classifies unknown jamming signals. When the jamming-to-noise ratio (JNR) exceeds 10 dB, the recognition rate for seven known jamming patterns and two unknown jamming patterns is more than 92%. Full article

(This article belongs to the Special Issue Artificial Intelligence-Driven Methods for Remote Sensing Target and Object Detection)

► Show Figures

Figure 1

18 pages, 1101 KB

Open AccessArticle

Self-Supervised Representation Learning for Quasi-Simultaneous Arrival Signal Identification Based on Reconnaissance Drones

by Linqing Guo, Mingyang Du, Jingwei Xiong, Zilong Wu and Jifei Pan

Drones 2023, 7(7), 475; https://doi.org/10.3390/drones7070475 - 19 Jul 2023

Cited by 3 | Viewed by 1827

Abstract

Reconnaissance unmanned aerial vehicles are specifically designed to estimate parameters and process intercepted signals for the purpose of identifying and locating radars. However, distinguishing quasi-simultaneous arrival signals (QSAS) has become increasingly challenging in complex electromagnetic environments. In order to address the problem, a [...] Read more.

Reconnaissance unmanned aerial vehicles are specifically designed to estimate parameters and process intercepted signals for the purpose of identifying and locating radars. However, distinguishing quasi-simultaneous arrival signals (QSAS) has become increasingly challenging in complex electromagnetic environments. In order to address the problem, a framework for self-supervised deep representation learning is proposed. The framework consists of two phases: (1) pre-train an autoencoder. For learning the unlabeled QSAS representation, the ConvNeXt V2 is trained to extract features from masked time–frequency images and reconstruct the corresponding signal in both time and frequency domains; (2) transfer the learned knowledge. For downstream tasks, encoder layers are frozen, the linear layer is fine-tuned to classify QSAS under few-shot conditions. Experimental results demonstrate that the proposed algorithm can achieve an average recognition accuracy of over 81% with the signal-to-noise ratio in the range of −16∼16 dB. Compared to existing CNN-based and Transformer-based neural networks, the proposed algorithm shortens the time of testing by about 11× and improves accuracy by up to 21.95%. Full article

(This article belongs to the Special Issue AI Based Signal Processing for Drones)

► Show Figures

Figure 1

14 pages, 3689 KB

Open AccessArticle

Coarse-Grained Modeling of EUV Patterning Process Reflecting Photochemical Reactions and Chain Conformations

by Tae-Yi Kim, In-Hwa Kang, Juhae Park, Myungwoong Kim, Hye-Keun Oh and Su-Mi Hur

Polymers 2023, 15(9), 1988; https://doi.org/10.3390/polym15091988 - 22 Apr 2023

Cited by 3 | Viewed by 3369

Abstract

Enabling extreme ultraviolet lithography (EUVL) as a viable and efficient sub-10 nm patterning tool requires addressing the critical issue of reducing line edge roughness (LER). Stochastic effects from random and local variability in photon distribution and photochemical reactions have been considered the primary [...] Read more.

Enabling extreme ultraviolet lithography (EUVL) as a viable and efficient sub-10 nm patterning tool requires addressing the critical issue of reducing line edge roughness (LER). Stochastic effects from random and local variability in photon distribution and photochemical reactions have been considered the primary cause of LER. However, polymer chain conformation has recently attracted attention as an additional factor influencing LER, necessitating detailed computational studies with explicit chain representation and photon distribution to overcome the existing approach based on continuum models and random variables. We developed a coarse-grained molecular simulation model for an EUV patterning process to investigate the effect of chain conformation variation and stochastic effects via photon shot noise and acid diffusion on the roughness of the pattern. Our molecular simulation demonstrated that final LER is most sensitive to the variation in photon distributions, while material distributions and acid diffusion rate also impact LER; thus, the intrinsic limit of LER is expected even at extremely suppressed stochastic effects. Furthermore, we proposed and tested a novel approach to improve the roughness by controlling the initial polymer chain orientation. Full article

(This article belongs to the Section Polymer Chemistry)

► Show Figures

Figure 1

26 pages, 2460 KB

Open AccessArticle

Few-Shot Emergency Siren Detection

by Michela Cantarini, Leonardo Gabrielli and Stefano Squartini

Sensors 2022, 22(12), 4338; https://doi.org/10.3390/s22124338 - 8 Jun 2022

Cited by 11 | Viewed by 5320

Abstract

It is a well-established practice to build a robust system for sound event detection by training supervised deep learning models on large datasets, but audio data collection and labeling are often challenging and require large amounts of effort. This paper proposes a workflow [...] Read more.

It is a well-established practice to build a robust system for sound event detection by training supervised deep learning models on large datasets, but audio data collection and labeling are often challenging and require large amounts of effort. This paper proposes a workflow based on few-shot metric learning for emergency siren detection performed in steps: prototypical networks are trained on publicly available sources or synthetic data in multiple combinations, and at inference time, the best knowledge learned in associating a sound with its class representation is transferred to identify ambulance sirens, given only a few instances for the prototype computation. Performance is evaluated on siren recordings acquired by sensors inside and outside the cabin of an equipped car, investigating the contribution of filtering techniques for background noise reduction. The results show the effectiveness of the proposed approach, achieving AUPRC scores equal to 0.86 and 0.91 in unfiltered and filtered conditions, respectively, outperforming a convolutional baseline model with and without fine-tuning for domain adaptation. Extensive experiments conducted on several recording sensor placements prove that few-shot learning is a reliable technique even in real-world scenarios and gives valuable insights for developing an in-car emergency vehicle detection system. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Audio Signal Processing)

► Show Figures

Figure 1

19 pages, 2475 KB

Open AccessArticle

Graph-Based Embedding Smoothing Network for Few-Shot Scene Classification of Remote Sensing Images

by Zhengwu Yuan, Wendong Huang, Chan Tang, Aixia Yang and Xiaobo Luo

Remote Sens. 2022, 14(5), 1161; https://doi.org/10.3390/rs14051161 - 26 Feb 2022

Cited by 25 | Viewed by 3416

Abstract

As a fundamental task in the field of remote sensing, scene classification is increasingly attracting attention. The most popular way to solve scene classification is to train a deep neural network with a large-scale remote sensing dataset. However, given a small amount of [...] Read more.

As a fundamental task in the field of remote sensing, scene classification is increasingly attracting attention. The most popular way to solve scene classification is to train a deep neural network with a large-scale remote sensing dataset. However, given a small amount of data, how to train a deep neural network with outstanding performance remains a challenge. Existing methods seek to take advantage of transfer knowledge or meta-knowledge to resolve the scene classification issue of remote sensing images with a handful of labeled samples while ignoring various class-irrelevant noises existing in scene features and the specificity of different tasks. For this reason, in this paper, an end-to-end graph neural network is presented to enhance the performance of scene classification in few-shot scenarios, referred to as the graph-based embedding smoothing network (GES-Net). Specifically, GES-Net adopts an unsupervised non-parametric regularizer, called embedding smoothing, to regularize embedding features. Embedding smoothing can capture high-order feature interactions in an unsupervised manner, which is adopted to remove undesired noises from embedding features and yields smoother embedding features. Moreover, instead of the traditional sample-level relation representation, GES-Net introduces a new task-level relation representation to construct the graph. The task-level relation representation can capture the relations between nodes from the perspective of the whole task rather than only between samples, which can highlight subtle differences between nodes and enhance the discrimination of the relations between nodes. Experimental results on three public remote sensing datasets, UC Merced, WHU-RS19, and NWPU-RESISC45, showed that the proposed GES-Net approach obtained state-of-the-art results in the settings of limited labeled samples. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Graphical abstract

19 pages, 5529 KB

Open AccessArticle

Combinational Fusion and Global Attention of the Single-Shot Method for Synthetic Aperture Radar Ship Detection

by Libo Xu, Chaoyi Pang, Yan Guo and Zhenyu Shu

Remote Sens. 2021, 13(23), 4781; https://doi.org/10.3390/rs13234781 - 25 Nov 2021

Cited by 3 | Viewed by 2492

Abstract

Synthetic Aperture Radar (SAR), an active remote sensing imaging radar technology, has certain surface penetration ability and can work all day and in all weather conditions. It is widely applied in ship detection to quickly collect ship information on the ocean surface from [...] Read more.

Synthetic Aperture Radar (SAR), an active remote sensing imaging radar technology, has certain surface penetration ability and can work all day and in all weather conditions. It is widely applied in ship detection to quickly collect ship information on the ocean surface from SAR images. However, the ship SAR images are often blurred, have large noise interference, and contain more small targets, which pose challenges to popular one-stage detectors, such as the single-shot multi-box detector (SSD). We designed a novel network structure, a combinational fusion SSD (CF-SSD), based on the framework of the original SSD, to solve these problems. It mainly includes three blocks, namely a combinational fusion (CF) block, a global attention module (GAM), and a mixed loss function block, to significantly improve the detection accuracy of SAR images and remote sensing images and maintain a fast inference speed. The CF block equips every feature map with the ability to detect objects of all sizes at different levels and forms a consistent and powerful detection structure to learn more useful information for SAR features. The GAM block produces attention weights and considers the channel attention information of various scale feature information or cross-layer maps so that it can obtain better feature representations from the global perspective. The mixed loss function block can better learn the positions of the truth anchor boxes by considering corner and center coordinates simultaneously. CF-SSD can effectively extract and fuse the features, avoid the loss of small or blurred object information, and precisely locate the object position from SAR images. We conducted experiments on the SAR ship dataset SSDD, and achieved a 90.3% mAP and fast inference speed close to that of the original SSD. We also tested our model on the remote sensing dataset NWPU VHR-10 and the common dataset VOC2007. The experimental results indicate that our proposed model simultaneously achieves excellent detection performance and high efficiency. Full article

(This article belongs to the Special Issue Deep Learning for Radar and Sonar Image Processing)

► Show Figures

Graphical abstract

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI