MDPI - Publisher of Open Access Journals

39 pages, 8519 KB

Open AccessArticle

Physics-Prior-Augmented Deep Learning for Acoustic Convergence Zone Identification in Data-Scarce Marine Environments

by Haoyu Wang, Shuai Chang, Hao Zheng, Shuo Yang, Jianxin He and Xiong Deng

J. Mar. Sci. Eng. 2026, 14(11), 1028; https://doi.org/10.3390/jmse14111028 - 31 May 2026

Viewed by 70

High-precision identification of acoustic convergence zones (CZs) and acoustic shadow zones (SZs) is a core prerequisite for deep-sea sonar performance prediction and long-range underwater target detection. However, in data-scarce marine environments, traditional acoustic identification methods suffer from high environmental sensitivity and significant computational [...] Read more.

High-precision identification of acoustic convergence zones (CZs) and acoustic shadow zones (SZs) is a core prerequisite for deep-sea sonar performance prediction and long-range underwater target detection. However, in data-scarce marine environments, traditional acoustic identification methods suffer from high environmental sensitivity and significant computational costs, while pure data-driven deep learning methods face dilemmas such as a lack of physical consistency and poor generalization on small samples. To address these issues, a three-level cascaded recognition framework based on physics-prior-augmented deep learning is proposed in this paper, enabling accurate segmentation of CZs and intelligent classification of sound field types under data-scarce scenarios. In this framework, physical acoustic principles are incorporated exclusively as priors through a training dataset generated by a Gaussian beam acoustic propagation code (Bellhop) and through hand-crafted geometric features derived post hoc from the initial segmentation outputs. Taking a typical deep-sea area in the Northwest Pacific Ocean as the research object, a hybrid dataset comprising 5000 simulated transmission loss images and 500 simulated images from a geographically distinct sea area is constructed. The sound field is categorized into four types: strong convergence, usable convergence, weak convergence, and shadow zone. In the first stage, the ResNet-34 backbone is improved by integrating deformable convolution and a global statistical feature module, which, combined with a joint loss function, achieves high-precision pixel-level segmentation of CZs and SZs, with the regional gray contrast reaching 86.9%. In the second stage, a customized dual-channel VGG16 architecture is designed to fuse the extracted geometric priors and visual features, achieving a sound field classification accuracy of 89.91%. In the third stage, a hybrid data augmentation technique combining Mixup and convolutional autoencoder is adopted alongside a transfer learning strategy to mitigate the data scarcity under cross-domain conditions, boosting the small-sample classification accuracy to 84.45%. The experimental results demonstrate that the models in each stage of the proposed framework significantly outperform traditional methods and baseline networks. This study provides a novel methodology and technical support for intelligent sound field identification in data-scarce marine environments. Finally, the core contributions and current limitations are summarized, and future research directions, such as constructing a dynamic hydrological parameter feedback mechanism and identifying three-dimensional complex sound fields, are prospected. Full article

(This article belongs to the Section Ocean Engineering)

25 pages, 16006 KB

Open AccessArticle

Underwater Target Recognition with Fusion of Multi-Domain Temporal Features

by Xiaochun Liu, Chenyu Wang, Yunchuan Yang, Xiangfeng Yang, Youfeng Hu and Jianguo Liu

Acoustics 2026, 8(2), 22; https://doi.org/10.3390/acoustics8020022 - 25 Mar 2026

Viewed by 911

Abstract

The dynamic nature of acoustic environments—particularly the fluctuation of underwater channels and time-varying target observation angles—poses significant challenges for active sonar target recognition, a problem further aggravated by the scarcity of labeled training samples. To address these limitations, this paper proposes a novel [...] Read more.

The dynamic nature of acoustic environments—particularly the fluctuation of underwater channels and time-varying target observation angles—poses significant challenges for active sonar target recognition, a problem further aggravated by the scarcity of labeled training samples. To address these limitations, this paper proposes a novel recognition method enabling deep fusion of multi-domain temporal features extracted from target echoes. First, complementary features are extracted across spatial, time–frequency, and Doppler domains to achieve a comprehensive and discriminative representation of targets. Subsequently, we introduce a feature vector-level fusion mechanism designed specifically for few-shot learning, integrating a meta-knowledge-driven multi-stream feature extractor with an internal memory module within the feature tensor framework. This architecture constitutes the Multi-domain Temporal Feature Fusion Recognition Network (MTFF-RNet). The proposed approach is evaluated on a hybrid dataset combining simulated and experimental data, achieving a high recognition accuracy of 96.2% for both targets and interferents. Experimental results demonstrate that MTFF-RNet significantly enhances robustness and adaptability under varying underwater acoustic conditions and dynamic viewing geometries. Full article

► Show Figures

Figure 1

25 pages, 13561 KB

Open AccessArticle

An Underwater Target Recognition Method Based on Feature Fusion and Balanced Ensemble Transfer Learning

by Haoqian Zhang, Hong Liang, Linfeng Zhu and Wenbo Gou

J. Mar. Sci. Eng. 2026, 14(6), 579; https://doi.org/10.3390/jmse14060579 - 20 Mar 2026

Cited by 1 | Viewed by 380

Abstract

In underwater target recognition scenarios, challenges arise as a result of the limited representational capability of acoustic images with single time-frequency features and poor recognition performance due to class imbalances in sample numbers. To tackle these issues, this paper proposes an underwater target [...] Read more.

In underwater target recognition scenarios, challenges arise as a result of the limited representational capability of acoustic images with single time-frequency features and poor recognition performance due to class imbalances in sample numbers. To tackle these issues, this paper proposes an underwater target recognition method based on feature fusion and balanced ensemble transfer learning. A LiT-INN dual-branch auto-encoder network architecture is employed for time-frequency image feature fusion to solve the weak feature representation capability of single time–frequency features. The Restormer network serves as a shared feature encoder to extract fundamental features, enabling feature fusion of underwater target echo time–frequency image data and generating a fusion image dataset with richer feature information. In order to address class imbalance in sample sizes, a balanced ensemble transfer learning method is constructed using a two-stage decoupled fine-tuning learning method. The first stage employs a uniform sampler strategy to fine-tune the feature extraction module of a pre-trained transfer learning model. The second stage uses multiple balanced sampling optimization methods to fine-tune the classifier. Then, a weight averaging ensemble learning method performs decision-level fusion of multiple weak classifiers. Field test data from three target classes validated the performance of the algorithm, demonstrating a 3% improvement in average recognition accuracy compared to deep transfer learning methods under different imbalance ratios. This method effectively enhances recognition performance for classes with limited samples while significantly boosting overall recognition accuracy, offering a novel solution for underwater target recognition. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

21 pages, 20926 KB

Open AccessArticle

Research on Neuro-Acoustic Human–Machine Collaborative Inter-Domain Global Attention Fusion for Underwater Acoustic Target Recognition

by Jiaqi Zhang, Zhangsong Shi, Huihui Xu, Zhe Rao, Songxue Bai and Junfeng Gao

J. Mar. Sci. Eng. 2026, 14(6), 578; https://doi.org/10.3390/jmse14060578 - 20 Mar 2026

Viewed by 437

Abstract

To enhance the adaptability of current underwater acoustic target recognition technology in complex marine environments and improve the performance of human–machine collaborative operations, this study proposes a human–machine collaborative underwater acoustic target recognition technology based on brain–computer interface technology. This method combines synchronized [...] Read more.

To enhance the adaptability of current underwater acoustic target recognition technology in complex marine environments and improve the performance of human–machine collaborative operations, this study proposes a human–machine collaborative underwater acoustic target recognition technology based on brain–computer interface technology. This method combines synchronized underwater acoustic neural features between acoustic signals and human brains to propose an inter-domain global attention fusion module to explore the fusion relationship of features at different depths, and to enhance the joint feature expression ability by combining potential complementary information between modalities. The experimental results show that the proposed network model can enhance the feature discrimination ability and obtain a more stable recognition model. Compared to a single feature, the human–machine collaborative fusion-feature model exhibits stronger classification performance, with an average classification accuracy of 96.4444%. This method can alleviate the limitations of single-mode underwater acoustic target recognition technology, combine the complementary advantages of humans and machines to achieve effective human–machine cooperation, and provide new insights for future underwater recognition technology and marine research. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

27 pages, 28242 KB

Open AccessArticle

Physics-Informed Side-Scan Sonar Perception: Tackling Weak Targets and Sparse Debris via Geometric and Frequency Decoupling

by Bojian Yu, Rongsheng Lin, Hanxiang Zhou, Jianxiong Zhang and Xinwei Zhang

Sensors 2026, 26(6), 1938; https://doi.org/10.3390/s26061938 - 19 Mar 2026

Viewed by 532

Abstract

Side-scan sonar (SSS) serves as the primary perceptual instrument for Autonomous Underwater Vehicles (AUVs) in large-scale marine search and rescue (SAR) operations. However, the detection of critical targets is frequently hindered by severe hydro-acoustic noise, the spatial discontinuity of wreckage, and the weak [...] Read more.

Side-scan sonar (SSS) serves as the primary perceptual instrument for Autonomous Underwater Vehicles (AUVs) in large-scale marine search and rescue (SAR) operations. However, the detection of critical targets is frequently hindered by severe hydro-acoustic noise, the spatial discontinuity of wreckage, and the weak visual signatures of small targets. To surmount these challenges, this paper presents WPG-DetNet. First, we introduce a Wavelet-Embedded Residual Backbone (WERB) to reconstruct the conventional downsampling paradigm. By substituting standard pooling with the Discrete Wavelet Transform (DWT), this architecture explicitly disentangles high-frequency noise from structural information in the frequency domain, thereby achieving the adaptive preservation of edge fidelity for large human-made targets while filtering out speckle interference. Then, addressing the distinct challenge of discontinuous aircraft wreckage, the framework further incorporates a Debris Graph Reasoning Module (D-GRM). This module models scattered fragments as nodes in a topological graph to capture long-range semantic dependencies, transforming isolated instance recognition into context-aware scene understanding. Finally, to bridge the gap between AI and underwater physics, we design a Shadow-Aided Decoupling Head (SADH) equipped with a physics-informed geometric loss. By enforcing mathematical consistency between target height and acoustic shadow length, this mechanism establishes a rigorous discriminative criterion capable of distinguishing weak-echo human bodies from seabed rocks based on shadow geometry. Experiments on the SCTD dataset demonstrate that WPG-DetNet achieves a mean Average Precision (

m A P_{50}

) of 97.5% and a Recall of 96.9%. Quantitative analysis reveals that our framework outperforms the classic Faster R-CNN by a margin of 12.8% in

m A P_{50}

and surpasses the Transformer-based RT-DETR-R18 by 5.6% in high-precision localization metrics (

m A P_{50 : 95}

). Simultaneously, WPG-DetNet maintains superior efficiency with an inference speed of 62.5 FPS and a lightweight parameter count of 16.8 M, striking an optimal balance between robust perception and the real-time constraints of AUV operations. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

33 pages, 40054 KB

Open AccessArticle

MVDCNN: A Multi-View Deep Convolutional Network with Feature Fusion for Robust Sonar Image Target Recognition

by Yue Fan, Cheng Peng, Peng Zhang, Zhisheng Zhang, Guoping Zhang and Jinsong Tang

Remote Sens. 2026, 18(1), 76; https://doi.org/10.3390/rs18010076 - 25 Dec 2025

Cited by 1 | Viewed by 1004

Abstract

Automatic Target Recognition (ATR) in single-view sonar imagery is severely hampered by geometric distortions, acoustic shadows, and incomplete target information due to occlusions and the slant-range imaging geometry, which frequently give rise to misclassification and hinder practical underwater detection applications. To address these [...] Read more.

Automatic Target Recognition (ATR) in single-view sonar imagery is severely hampered by geometric distortions, acoustic shadows, and incomplete target information due to occlusions and the slant-range imaging geometry, which frequently give rise to misclassification and hinder practical underwater detection applications. To address these critical limitations, this paper proposes a Multi-View Deep Convolutional Neural Network (MVDCNN) based on feature-level fusion for robust sonar image target recognition. The MVDCNN adopts a highly modular and extensible architecture consisting of four interconnected modules: an input reshaping module that adapts multi-view images to match the input format of pre-trained backbone networks via dimension merging and channel replication; a shared-weight feature extraction module that leverages Convolutional Neural Network (CNN) or Transformer backbones (e.g., ResNet, Swin Transformer, Vision Transformer) to extract discriminative features from each view, ensuring parameter efficiency and cross-view feature consistency; a feature fusion module that aggregates complementary features (e.g., target texture and shape) across views using max-pooling to retain the most salient characteristics and suppress noisy or occluded view interference; and a lightweight classification module that maps the fused feature representations to target categories. Additionally, to mitigate the data scarcity bottleneck in sonar ATR, we design a multi-view sample augmentation method based on sonar imaging geometric principles: this method systematically combines single-view samples of the same target via the combination formula and screens valid samples within a predefined azimuth range, constructing high-quality multi-view training datasets without relying on complex generative models or massive initial labeled data. Comprehensive evaluations on the Custom Side-Scan Sonar Image Dataset (CSSID) and Nankai Sonar Image Dataset (NKSID) demonstrate the superiority of our framework over single-view baselines. Specifically, the two-view MVDCNN achieves average classification accuracies of 94.72% (CSSID) and 97.24% (NKSID), with relative improvements of 7.93% and 5.05%, respectively; the three-view MVDCNN further boosts the average accuracies to 96.60% and 98.28%. Moreover, MVDCNN substantially elevates the precision and recall of small-sample categories (e.g., Fishing net and Small propeller in NKSID), effectively alleviating the class imbalance challenge. Mechanism validation via t-Distributed Stochastic Neighbor Embedding (t-SNE) feature visualization and prediction confidence distribution analysis confirms that MVDCNN yields more separable feature representations and more confident category predictions, with stronger intra-class compactness and inter-class discrimination in the feature space. The proposed MVDCNN framework provides a robust and interpretable solution for advancing sonar ATR and offers a technical paradigm for multi-view acoustic image understanding in complex underwater environments. Full article

(This article belongs to the Special Issue Underwater Remote Sensing: Status, New Challenges and Opportunities)

► Show Figures

Graphical abstract

19 pages, 1786 KB

Open AccessArticle

Path-Routing Convolution and Scalable Lightweight Networks for Robust Underwater Acoustic Target Recognition

by Yue Zhao, Menghan Chen, Yuchen Lu, Liangliang Cheng, Cheng Chen, Yifei Li and Nizar Faisal Alkayem

Sensors 2025, 25(22), 7007; https://doi.org/10.3390/s25227007 - 17 Nov 2025

Viewed by 943

Abstract

Maritime traffic surveillance and ocean environmental protection urgently require the accurate identification of surface vessel types. Although deep learning methods have significantly improved the underwater acoustic target recognition performance, the existing models suffer from large parameter counts and fail to adapt to the [...] Read more.

Maritime traffic surveillance and ocean environmental protection urgently require the accurate identification of surface vessel types. Although deep learning methods have significantly improved the underwater acoustic target recognition performance, the existing models suffer from large parameter counts and fail to adapt to the multi-scale spectral features of radiated noise from different vessel types, restricting their practical deployment on power-constrained underwater sensors. To address these challenges, this paper proposes a novel path-routing convolution mechanism that achieves the discriminative extraction of cross-scale acoustic features through multi-dilation-rate parallel paths and an adaptive routing strategy and designs the MobilePR-ConvNet unified architecture that enables a single framework to automatically adapt to diverse hardware platforms through systematic width scaling. Experiments on the DeepShip and ShipsEar datasets demonstrate that the proposed method achieved 98.58% and 97.82% recognition accuracies, respectively, while maintaining a 77.8% robust performance under 10 dB low-signal-to-noise-ratio conditions, validating the cross-dataset generalization capability in complex marine environments and providing an effective solution for intelligent deployment on resource-constrained underwater devices. Full article

(This article belongs to the Special Issue Advances in Automatic Speech Recognition, Audio and Underwater Acoustic Signal Analysis)

► Show Figures

Figure 1

31 pages, 15645 KB

Open AccessArticle

RCF-YOLOv8: A Multi-Scale Attention and Adaptive Feature Fusion Method for Object Detection in Forward-Looking Sonar Images

by Xiaoxue Li, Yuhan Chen, Xueqin Liu, Zhiliang Qin, Jiaxin Wan and Qingyun Yan

Remote Sens. 2025, 17(19), 3288; https://doi.org/10.3390/rs17193288 - 25 Sep 2025

Cited by 2 | Viewed by 2708

Abstract

Acoustic imaging systems are essential for underwater target recognition and localization, but forward-looking sonar (FLS) imagery faces challenges due to seabed variability, resulting in low resolution, blurred images, and sparse targets. To address these issues, we introduce RCF-YOLOv8, an enhanced detection framework based [...] Read more.

Acoustic imaging systems are essential for underwater target recognition and localization, but forward-looking sonar (FLS) imagery faces challenges due to seabed variability, resulting in low resolution, blurred images, and sparse targets. To address these issues, we introduce RCF-YOLOv8, an enhanced detection framework based on YOLOv8, designed to improve FLS image analysis. Key innovations include the use of CoordConv modules to better encode spatial information, improving feature extraction and reducing misdetection rates. Additionally, an efficient multi-scale attention (EMA) mechanism addresses sparse target distributions, optimizing feature fusion and improving the network’s ability to identify key areas. Lastly, the C2f module with high-quality feature fusion (C2f-Fusion) optimizes feature extraction from noisy backgrounds. RCF-YOLOv8 achieved a 98.8% mAP@50 and a 67.6% mAP@50-95 on the URPC2021 dataset, outperforming baseline models with a 2.4% increase in single-threshold accuracy and a 10.4% increase in multi-threshold precision, demonstrating its robustness for underwater detection. Full article

(This article belongs to the Special Issue Efficient Object Detection Based on Remote Sensing Images)

► Show Figures

Figure 1

19 pages, 3327 KB

Open AccessArticle

Design and Research of High-Energy-Efficiency Underwater Acoustic Target Recognition System

by Ao Ma, Wenhao Yang, Pei Tan, Yinghao Lei, Liqin Zhu, Bingyao Peng and Ding Ding

Electronics 2025, 14(19), 3770; https://doi.org/10.3390/electronics14193770 - 24 Sep 2025

Cited by 1 | Viewed by 1161

Abstract

Recently, with the rapid development of underwater resource exploration and underwater activities, underwater acoustic (UA) target recognition has become crucial in marine resource exploration. However, traditional underwater acoustic recognition systems face challenges such as low energy efficiency, poor accuracy, and slow response times. [...] Read more.

Recently, with the rapid development of underwater resource exploration and underwater activities, underwater acoustic (UA) target recognition has become crucial in marine resource exploration. However, traditional underwater acoustic recognition systems face challenges such as low energy efficiency, poor accuracy, and slow response times. Systems for UA target recognition using deep learning networks have garnered widespread attention. Convolutional neural network (CNN) consumes significant computational resources and energy during convolution operations, which exacerbates the issues of energy consumption and complicates edge deployment. This paper explores a high-energy-efficiency UA target recognition system. Based on the DenseNet CNN, the system uses fine-grained pruning for sparsification and sparse convolution computations. The UA target recognition CNN was deployed on FPGAs and chips to achieve low-power recognition. Using the noise-disturbed ShipsEar dataset, the system reaches a recognition accuracy of 98.73% at 0 dB signal-to-noise ratio (SNR). After 50% fine-grained pruning, the accuracy is 96.11%. The circuit prototype on FPGA shows that the circuit achieves an accuracy of 95% at 0 dB SNR. This work implements the circuit design and layout of the UA target recognition chip based on a 65 nm CMOS process. DC synthesis results show that the power consumption is 90.82 mW, and the single-target recognition time is 7.81 ns. Full article

(This article belongs to the Special Issue Digital Intelligence Technology and Applications)

► Show Figures

Figure 1

30 pages, 3528 KB

Open AccessArticle

Multi-Task Mixture-of-Experts Model for Underwater Target Localization and Recognition

by Peng Qian, Jingyi Wang, Yining Liu, Yingxuan Chen, Pengjiu Wang, Yanfa Deng, Peng Xiao and Zhenglin Li

Remote Sens. 2025, 17(17), 2961; https://doi.org/10.3390/rs17172961 - 26 Aug 2025

Viewed by 2300

Abstract

The scarcity of underwater acoustic data in deep and remote sea environments poses a significant challenge to data-driven target recognition models, severely restricting their performance. To address this challenge, this study presents a ray-theory-based data augmentation method for generating synthetic ship-radiated noise datasets [...] Read more.

The scarcity of underwater acoustic data in deep and remote sea environments poses a significant challenge to data-driven target recognition models, severely restricting their performance. To address this challenge, this study presents a ray-theory-based data augmentation method for generating synthetic ship-radiated noise datasets in oceanic environments at a depth of 3500 m—DS3500, encompassing both direct and shadow zones. Additionally, a novel MEG (multi-task, multi-expert, multi-gate) framework is developed to achieve simultaneous target localization and recognition by integrating relative positional information between the target and sonar, which dynamically partitions parameter spaces through multi-expert mechanisms and adaptively combines task-specific representations using multi-gate attention to simultaneously predict target localization and recognition. Experimental results on the DS3500 dataset demonstrate that the MEG framework achieves 95.93% recognition accuracy, a range localization error of 0.2011 km and a depth localization error of 20.61 m with a maximum detection range of 11 km and depth of 1100 m. This study provides a new technical solution for underwater acoustic target recognition in deep and remote seas, offering innovative approaches for practical applications in marine monitoring and defense. Full article

► Show Figures

Figure 1

19 pages, 2289 KB

Open AccessArticle

Class-Incremental Learning-Based Few-Shot Underwater-Acoustic Target Recognition

by Wenbo Wang, Ye Li, Tongsheng Shen and Dexin Zhao

J. Mar. Sci. Eng. 2025, 13(9), 1606; https://doi.org/10.3390/jmse13091606 - 22 Aug 2025

Cited by 1 | Viewed by 1410

Abstract

This paper proposes an underwater-acoustic class-incremental few-shot learning (UACIL) method for streaming data processing in practical underwater-acoustic target recognition scenarios. The core objective is to expand classification capabilities for new classes while mitigating catastrophic forgetting of existing knowledge. UACIL’s contributions encompass three key [...] Read more.

This paper proposes an underwater-acoustic class-incremental few-shot learning (UACIL) method for streaming data processing in practical underwater-acoustic target recognition scenarios. The core objective is to expand classification capabilities for new classes while mitigating catastrophic forgetting of existing knowledge. UACIL’s contributions encompass three key components: First, to enhance feature discriminability and generalization, an enhanced frequency-domain attention module is introduced to capture both spatial and temporal variation features. Second, it introduces a prototype classification mechanism with two operating modes corresponding to the base-training phase and the incremental training phase. In the base phase, sufficient pre-training is performed on the feature extraction network and the classification heads of inherent categories. In the incremental phase, for streaming data processing, only the classification heads of new categories are expanded and updated, while the parameters of the feature extractor remain stable through prototype classification. Third, a joint optimization strategy using multiple loss functions is designed to refine feature distribution. This method enables rapid deployment without complex cross-domain retraining when handling new data classes, effectively addressing overfitting and catastrophic forgetting in hydroacoustic signal classification. Experimental results with public datasets validate its superior incremental learning performance. The proposed method achieves 92.89% base recognition accuracy and maintains 68.44% overall accuracy after six increments. Compared with baseline methods, it improves base accuracy by 11.14% and reduces the incremental performance-dropping rate by 50.09%. These results demonstrate that UACIL enhances recognition accuracy while alleviating catastrophic forgetting, confirming its feasibility for practical applications. Full article

(This article belongs to the Special Issue Underwater Acoustics: Advances in Modelling, Measurement, and Technological Applications)

► Show Figures

Figure 1

46 pages, 5911 KB

Open AccessArticle

Leveraging Prior Knowledge in Semi-Supervised Learning for Precise Target Recognition

by Guohao Xie, Zhe Chen, Yaan Li, Mingsong Chen, Feng Chen, Yuxin Zhang, Hongyan Jiang and Hongbing Qiu

Remote Sens. 2025, 17(14), 2338; https://doi.org/10.3390/rs17142338 - 8 Jul 2025

Viewed by 1432

Abstract

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, [...] Read more.

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, enhanced by domain-specific prior knowledge. The architecture employs a Convolutional Block Attention Module (CBAM) for localized feature refinement, a lightweight New Transformer Encoder for global context modeling, and a novel TriFusion Block to synergize spectral–temporal–spatial features through parallel multi-branch fusion, addressing the limitations of single-modality extraction. Leveraging the mean teacher framework, DART-MT optimizes consistency regularization to exploit unlabeled data, effectively mitigating class imbalance and annotation scarcity. Evaluations on the DeepShip and ShipsEar datasets demonstrate state-of-the-art accuracy: with 10% labeled data, DART-MT achieves 96.20% (DeepShip) and 94.86% (ShipsEar), surpassing baseline models by 7.2–9.8% in low-data regimes, while reaching 98.80% (DeepShip) and 98.85% (ShipsEar) with 90% labeled data. Under varying noise conditions (−20 dB to 20 dB), the model maintained a robust performance (F1-score: 92.4–97.1%) with 40% lower variance than its competitors, and ablation studies validated each module’s contribution (TriFusion Block alone improved accuracy by 6.9%). This research advances UATR by (1) resolving multi-scale feature fusion bottlenecks, (2) demonstrating the efficacy of semi-supervised learning in marine acoustics, and (3) providing an open-source implementation for reproducibility. In future work, we will extend cross-domain adaptation to diverse oceanic environments. Full article

(This article belongs to the Special Issue Remote Sensing Target Recognition and Detection: Theory and Applications (Second Edition))

► Show Figures

Figure 1

22 pages, 4360 KB

Open AccessFeature PaperArticle

Underwater Target Recognition Method Based on Singular Spectrum Analysis and Channel Attention Convolutional Neural Network

by Fang Ji, Shaoqing Lu, Junshuai Ni, Ziming Li and Weijia Feng

Sensors 2025, 25(8), 2573; https://doi.org/10.3390/s25082573 - 18 Apr 2025

Cited by 7 | Viewed by 1493

Abstract

In order to improve the efficiency of the deep network model in processing the radiated noise signals of underwater acoustic targets, this paper introduces a Singular Spectrum Analysis and Channel Attention Convolutional Neural Network (SSA-CACNN) model. The front end of the model is [...] Read more.

In order to improve the efficiency of the deep network model in processing the radiated noise signals of underwater acoustic targets, this paper introduces a Singular Spectrum Analysis and Channel Attention Convolutional Neural Network (SSA-CACNN) model. The front end of the model is designed as an SSA filter, and its input is the time-domain signal that has undergone simple preprocessing. The SSA method is utilized to separate the noise efficiently and reliably from useful signals. The first three orders of useful signals are then fed into the CACNN model, which has a convolutional layer set up at the beginning of the model to further remove noise from the signal. Then, the attention of the model to the feature signal channels is enhanced through the combination of multiple groups of convolutional operations and the channel attention mechanism, which facilitates the model’s ability to discern the essential characteristics of the underwater acoustic signals and improve the target recognition rate. Experimental Results: The signal reconstructed by the first three-order waveforms at the front end of the SSA-CACNN model proposed in this paper can retain most of the features of the target. In the experimental verification using the ShipsEar dataset, the model achieved a recognition accuracy of 98.64%. The model’s parameter count of 0.26 M was notably lower than that of other comparable deep models, indicating a more efficient use of resources. Additionally, the SSA-CACNN model had a certain degree of robustness to noise, with a correct recognition rate of 84.61% maintained when the signal-to-noise ratio (SNR) was −10 dB. Finally, the pre-trained SSA-CACNN model on the ShipsEar dataset was transferred to the DeepShip dataset with a recognition accuracy of 94.98%. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

19 pages, 21547 KB

Open AccessArticle

High-Frequency Passive Acoustic Recognition in Underwater Environments: Echo-Based Coding for Layered Elastic Shells

by Zixuan Dai, Zilong Peng and Suchen Xu

Appl. Sci. 2025, 15(7), 3698; https://doi.org/10.3390/app15073698 - 27 Mar 2025

Viewed by 902

Abstract

Addressing the limitations of restricted coding capacity and material dependency in acoustic identity tags for autonomous underwater vehicles (AUVs), this study introduces a novel passive acoustic identification tag (AID) design based on multilayered elastic cylindrical shells. By developing a Normal Mode Series (NMS) [...] Read more.

Addressing the limitations of restricted coding capacity and material dependency in acoustic identity tags for autonomous underwater vehicles (AUVs), this study introduces a novel passive acoustic identification tag (AID) design based on multilayered elastic cylindrical shells. By developing a Normal Mode Series (NMS) analytical model and validating it through finite element method (FEM) simulations, the work elucidates how material layering strategies regulate far-field target strength (TS) and establishes a time-domain multi-peak echo-based encoding framework. Results demonstrate that optimizing material impedance contrasts achieves 99% detection success at a 3 dB signal-to-noise ratio. Jaccard similarity analysis of 3570 material combinations reveals a system-wide average recognition error rate of 0.41%, confirming robust encoding reliability. The solution enables the combinatorial expansion of coding capacity with structural layers, yielding 210, 840, and 2520 unique codes for three-, four-, and five-layer configurations, respectively. These findings validate a scalable, hull-integrated acoustic identification system that overcomes material constraints while providing high-capacity encoding for compact AUVs, significantly advancing underwater acoustic tagging technologies through physics-driven design and systematic performance validation. Full article

(This article belongs to the Special Issue Recent Advances in Underwater Acoustic Communication)

► Show Figures

Figure 1

32 pages, 4387 KB

Open AccessReview

Recent Progress in Ocean Intelligent Perception and Image Processing and the Impacts of Nonlinear Noise

by Huayu Liu, Ying Li, Tao Qian and Ye Tang

Mathematics 2025, 13(7), 1043; https://doi.org/10.3390/math13071043 - 23 Mar 2025

Cited by 4 | Viewed by 1997

Abstract

Deep learning network models are crucial in processing images acquired from optical, laser, and acoustic sensors in ocean intelligent perception and target detection. This work comprehensively reviews ocean intelligent perception and image processing technology, including ocean intelligent perception devices and image acquisition, image [...] Read more.

Deep learning network models are crucial in processing images acquired from optical, laser, and acoustic sensors in ocean intelligent perception and target detection. This work comprehensively reviews ocean intelligent perception and image processing technology, including ocean intelligent perception devices and image acquisition, image recognition and detection models, adaptive image processing processes, and coping methods for nonlinear noise interference. As the core tasks of ocean image processing, image recognition and detection network models are the research focus of this article. The focus is on the development of deep-learning network models for ocean image recognition and detection, such as SSD, R-CNN series, and YOLO series. The detailed analysis of the mathematical structure of the YOLO model and the differences between various versions, which determine the detection accuracy and inference speed, provides a deeper understanding. It also reviewed adaptive image processing processes and their critical support for ocean image recognition and detection, such as image annotation, feature enhancement, and image segmentation. Research and practical applications show that nonlinear noise significantly affects underwater image processing. When combined with image enhancement, data augmentation, and transfer learning methods, deep learning algorithms can be applied to effectively address the challenges of underwater image degradation and nonlinear noise interference. This work offers a unique perspective, highlighting the mathematical structure of the network model for ocean intelligent perception and image processing. It also discusses the benefits of DL-based denoising methods in signal–noise separation and noise suppression. With this unique perspective, this work is expected to inspire and motivate more valuable research in related fields. Full article

(This article belongs to the Special Issue Modern Trends in Nonlinear Dynamics in Ocean Engineering)

► Show Figures

Figure 1

Search Results (79)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (79)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI