Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (31)

Search Parameters:
Keywords = shot-noise representation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1748 KB  
Article
BiASTM-CL: Bidirectional Adaptive Spatiotemporal Modeling with Contrastive Learning for Few-Shot Action Recognition
by Jing Huang and Zijian Zhao
Electronics 2026, 15(8), 1637; https://doi.org/10.3390/electronics15081637 - 14 Apr 2026
Viewed by 285
Abstract
In few-shot action recognition (FSAR), limited annotated data and large scene variations make it difficult for models to learn stable spatial semantics and reliable temporal dynamics. As a result, spatiotemporal representations tend to be weak, and models often fail to focus on discriminative [...] Read more.
In few-shot action recognition (FSAR), limited annotated data and large scene variations make it difficult for models to learn stable spatial semantics and reliable temporal dynamics. As a result, spatiotemporal representations tend to be weak, and models often fail to focus on discriminative motion regions or capture frame-to-frame changes accurately. Furthermore, the insufficient fusion of local details and global context renders the learned features more susceptible to background noise and scene bias. These issues become more pronounced when background clutter is severe or when different action classes share locally similar segments, leading to unreliable support–query matching and shifted similarity distributions, which ultimately result in class confusion. To address these challenges, we propose a bidirectional adaptive spatiotemporal modeling method integrated with contrastive learning for FSAR. The method constructs attention-guided bidirectional differencing features to model inter-frame variations with semantic alignment, while suppressing background noise. It introduces a local–global interactive channel attention module to strengthen both local and global dynamic representations, and integrates dynamic distance adjustment with hard negative mining during tuple-level matching. This combination imposes contrastive constraints that enhance intra-class compactness and inter-class separability, thereby mitigating interference from cross-class similar segments. Experiments under the standard 5-way 1-shot/5-shot protocol demonstrate consistent improvements across multiple datasets, and the proposed method achieves the best performance under the 5-shot setting while remaining competitive under the 1-shot setting. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

25 pages, 11059 KB  
Article
Few-Shot Open-Set Object Detection with a Synthesized Monument Guided by Contrastive Distilled Prompts
by Hao Chen and Ying Chen
Appl. Sci. 2026, 16(7), 3474; https://doi.org/10.3390/app16073474 - 2 Apr 2026
Viewed by 319
Abstract
Few-shot open-set object detection (FS-OSOD) remains challenging in real-world scenarios, where detectors must accurately recognize known objects from few examples while reliably rejecting vast unknown categories. Under this setting, decision boundaries between known and unknown classes are easily distorted by data scarcity and [...] Read more.
Few-shot open-set object detection (FS-OSOD) remains challenging in real-world scenarios, where detectors must accurately recognize known objects from few examples while reliably rejecting vast unknown categories. Under this setting, decision boundaries between known and unknown classes are easily distorted by data scarcity and background clutter, leading to severe overfitting on base classes and overconfident misclassification of unknowns. Recent research attempts to alleviate these issues by regularizing detection heads to suppress base-class bias, or by leveraging vision–language priors through open-vocabulary alignment and prompt tuning to enhance semantic transferability. However, these solutions often overlook explicit modeling of truly out-of-set unknowns and the instability of prompt adaptation in low-data regimes, which can cause boundary drifts and make unknown proposals be absorbed by similar seen classes or even suppressed as background. To alleviate these issues, a guided prompt–monument network (GPMN) that is proposed, which jointly enhances prompt learning and feature representation learning for FS-OSOD. First, the contrastive distilled prompts (CDP) module employs a teacher–student prompt framework to decouple optimization across base, novel, and unknown classes. This strategy preserves transferability between zero-shot and few-shot settings while enhancing discrimination on base categories. Second, a synthesized monument module (SMM) maintains class-centered memory with momentum-updated prototypes and a non-parametric classifier, which compresses the overlap between seen and unseen distributions and provides a stable rejection margin for unknowns with strong co-occurrence and background noise. Compared with existing head-regularization and open-vocabulary prompt-tuning pipelines, GPMN explicitly targets both base-class bias and seen–unseen overlap at the region level. Extensive experiments on VOC10-5-5 and VOC-COCO benchmarks demonstrate that GPMN consistently improves unknown recall and few-shot mAP over representative FS-OSOD baselines. These results suggest that prompt-level decoupling mitigates base-class bias, whereas memory-anchored regularization enlarges the seen–unseen margin, jointly supporting reliable unknown rejection in scarce-supervision regimes. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

24 pages, 1545 KB  
Article
PMSDA: Progressive Multi-Strategy Domain Alignment for Cross-Scene Vibration Recognition in Distributed Optical Fiber Sensing
by Yuxiang Ni, Jing Cheng, Di Wu, Qianqian Duan, Linhua Jiang, Xing Hu and Dawei Zhang
Photonics 2026, 13(4), 334; https://doi.org/10.3390/photonics13040334 - 29 Mar 2026
Viewed by 524
Abstract
Distributed optical fiber vibration sensing (DVS) has shown strong potential in perimeter security, pipeline leakage monitoring, transportation safety, and structural health diagnostics owing to its high sensitivity, long-range coverage, and immunity to electromagnetic interference. However, severe cross-scene distribution mismatch is often encountered in [...] Read more.
Distributed optical fiber vibration sensing (DVS) has shown strong potential in perimeter security, pipeline leakage monitoring, transportation safety, and structural health diagnostics owing to its high sensitivity, long-range coverage, and immunity to electromagnetic interference. However, severe cross-scene distribution mismatch is often encountered in real-world deployments: indoor, outdoor, and pipeline environments exhibit markedly different noise patterns and time–frequency characteristics, thereby degrading the generalization ability of models trained in a single scene. To address this challenge, we propose a Progressive Multi-Strategy Domain Alignment (PMSDA) framework for label-disjoint cross-scene vibration recognition. PMSDA uses a compact expansion–compression encoder together with complementary alignment mechanisms—maximum mean discrepancy (MMD), correlation alignment (CORAL), and adversarial domain discrimination—to learn a scene-robust latent space from a labeled indoor source and two unlabeled target domains (outdoor and pipeline) within a single alternating-training model. Because the fine-grained source and target label spaces are disjoint, PMSDA is formulated as a representation-transfer framework rather than a standard label-shared unsupervised domain adaptation method; target-domain recognition is therefore performed through domain-specific prototype clustering in the aligned latent space. On three representative scenes with nine event classes in total, PMSDA achieved 89.5% accuracy, 86.7% macro-F1, and 0.93 AUC for Indoor→Outdoor, and 85.8%, 84.7%, and 0.87, respectively, for Indoor→Pipeline, outperforming traditional feature+SVM/RF pipelines, CNN/ResNet baselines, and representation-transfer baselines adapted from DANN/CDAN/SHOT under the same evaluation protocol. These results indicate that PMSDA is a promising and effective framework for offline cross-scene DVS evaluation under disjoint target event sets. Full article
(This article belongs to the Special Issue Machine Learning and Artificial Intelligence for Optical Networks)
Show Figures

Figure 1

26 pages, 13700 KB  
Article
DG-Net: Few-Shot Remote Sensing Detection with Dynamic Dual-Stream Collaboration and Generative Meta-Learning
by Shanliang Liu, Xinnan Shao, Yan Dong, Qihang He and Chunlei Li
Symmetry 2026, 18(3), 461; https://doi.org/10.3390/sym18030461 - 7 Mar 2026
Viewed by 345
Abstract
Existing research has demonstrated that meta-learning methods hold considerable promise in addressing the challenges posed by few-shot object detection. However, remote sensing scenarios present two major challenges. The sparse features of small objects provide insufficient support information for query enhancement, and significant morphological [...] Read more.
Existing research has demonstrated that meta-learning methods hold considerable promise in addressing the challenges posed by few-shot object detection. However, remote sensing scenarios present two major challenges. The sparse features of small objects provide insufficient support information for query enhancement, and significant morphological variations caused by lighting and viewpoint differences hinder intra-class consistency capture via direct alignment in few-shot learning. To address these challenges, we propose a generative meta-learning detection framework. The framework first introduces a Dynamic Relation Dual-Stream Network to achieve dynamic support-query feature alignment through joint modeling of evolutionary and relational features, thereby enhancing representation in few-shot conditions. Second, an Optimal Transport-based Generative Meta-Learner is developed to mitigate feature distribution bias via generative augmentation in latent space. Additionally, an Orthogonal Frequency Decomposition Head is incorporated to adaptively separate query features into low-frequency contour and high-frequency detail components, effectively suppressing background noise interference. Experiments on multiple remote sensing datasets demonstrate that the proposed method achieves consistent performance gains over leading baseline methods in various few-shot settings. Its effectiveness is further validated across different backbone networks, highlighting strong generalization in few-shot remote sensing object detection. Full article
(This article belongs to the Special Issue Symmetry/Asymmetry in Evolutionary Computation and Machine Learning)
Show Figures

Figure 1

29 pages, 2340 KB  
Article
Target-Aware Bilingual Stance Detection in Social Media Using Transformer Architecture
by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif
Electronics 2026, 15(4), 830; https://doi.org/10.3390/electronics15040830 - 14 Feb 2026
Viewed by 318
Abstract
Stance detection has emerged as an essential tool in natural language processing for understanding how individuals express agreement, disagreement, or neutrality toward specific targets in social and online discourse. It plays a crucial role in bilingual and multilingual environments, including English-Arabic social media [...] Read more.
Stance detection has emerged as an essential tool in natural language processing for understanding how individuals express agreement, disagreement, or neutrality toward specific targets in social and online discourse. It plays a crucial role in bilingual and multilingual environments, including English-Arabic social media ecosystems, where differences in language structure, discourse style, and data availability pose significant challenges for reliable stance modelling. Existing approaches often struggle with target awareness, cross-lingual generalization, robustness to noisy user-generated text, and the interpretability of model decisions. This study aims to build a reliable, explainable target-aware bilingual stance-detection framework that generalizes across heterogeneous stance formats and languages without retraining on a dataset specific to the target language. Thus, a unified dual-encoder architecture based on mDeBERTa-v3 is proposed. Cross-language contrastive learning offers an auxiliary training objective to align English and Arabic stance representations in a common semantic space. Robustness-oriented regularization is used to mitigate the effects of informal language, vocabulary variation, and adversarial noise. To promote transparency and trustworthiness, the framework incorporates token-level rationale extraction, enables fine-grained interpretability, and supports analysis of hallucination. The proposed model is tested on a combined bilingual test set and two structurally distinct zero-shot benchmarks: MT-CSD and AraStance. Experimental results show consistent performance, with accuracies of 85.0% and 86.8% and F1-scores of 84.7% and 86.8% on the zero-shot benchmarks, confirming stable performance and realistic generalization. Ultimately, these findings reveal that effective bilingual stance detection can be achieved via explicit target conditioning, cross-lingual alignment, and explainability-driven design. Full article
Show Figures

Figure 1

29 pages, 44274 KB  
Article
MSFFDet: A Meta-Learning-Based Support-Guided Feature Fusion Detector for Few-Shot Remote Sensing Detection
by Haoxiang Qi, Wenzhe Zhao, Ting Zhang and Guangyao Zhou
Appl. Sci. 2026, 16(2), 917; https://doi.org/10.3390/app16020917 - 15 Jan 2026
Viewed by 442
Abstract
Few-shot object detection in remote sensing imagery faces significant challenges, including limited labeled samples, complex scene backgrounds, and subtle inter-class differences. To tackle these issues, we design a novel detection framework that effectively transfers supervision from a few annotated support examples to the [...] Read more.
Few-shot object detection in remote sensing imagery faces significant challenges, including limited labeled samples, complex scene backgrounds, and subtle inter-class differences. To tackle these issues, we design a novel detection framework that effectively transfers supervision from a few annotated support examples to the query domain. We introduce a feature enhancement mechanism that injects fine-grained support cues into the query representation, helping the model focus on relevant regions and suppress background noise. This allows the model to generate more accurate proposals and perform robust classification, especially for visually confusing or small objects. Additionally, our method enhances feature interaction between support and query images through a nonlinear combination strategy, which captures both semantic similarity and discriminative differences. The proposed framework is fully end-to-end and jointly optimizes the feature fusion and detection processes. Experiments on three challenging benchmarks, NWPU VHR-10, iSAID and DIOR, demonstrate that our method consistently achieves state-of-the-art results under different few-shot settings and category splits. Compared with other advanced methods, it yields superior performance, highlighting its strong generalization ability in low-data remote sensing scenarios. Full article
(This article belongs to the Special Issue AI in Object Detection)
Show Figures

Figure 1

22 pages, 8610 KB  
Article
A Lightweight Degradation-Aware Framework for Robust Object Detection in Adverse Weather
by Seungun Park, Jiakang Kuai, Hyunsu Kim, Hyunseong Ko, ChanSung Jung and Yunsik Son
Electronics 2026, 15(1), 146; https://doi.org/10.3390/electronics15010146 - 29 Dec 2025
Cited by 1 | Viewed by 818
Abstract
Object detection in adverse weather remains challenging due to the simultaneous degradation of visibility, structural boundaries, and semantic consistency. Existing restoration-driven or multi-branch detection approaches often fail to recover task-relevant features or introduce substantial computational overhead. To address this problem, DLC-SSD, a lightweight [...] Read more.
Object detection in adverse weather remains challenging due to the simultaneous degradation of visibility, structural boundaries, and semantic consistency. Existing restoration-driven or multi-branch detection approaches often fail to recover task-relevant features or introduce substantial computational overhead. To address this problem, DLC-SSD, a lightweight degradation-aware framework for detecting robust objects in adverse weather environments, is proposed. The framework integrates image enhancement and feature refinement into a single detection pipeline and adopts a hierarchical strategy in which global and local degradations are corrected at the image level, structural cues are reinforced in shallow high-resolution features, and semantic representations are refined in deep layers to suppress weather-induced noise. These components are jointly optimized end-to-end with the single-shot multibox detection (SSD) backbone. In rain, fog, and low-light conditions, DLC-SSD demonstrated more stable performance than conventional detectors and maintained a quasi-real-time inference speed, confirming its practicality in intelligent monitoring and autonomous driving environments. Full article
Show Figures

Figure 1

35 pages, 3744 KB  
Review
Intelligent Fault Diagnosis for HVDC Systems Based on Knowledge Graph and Pre-Trained Models: A Critical and Comprehensive Review
by Qiang Li, Yue Ma, Jinyun Yu, Shenghui Cao, Shihong Zhang, Pengwang Zhang and Bo Yang
Energies 2025, 18(24), 6438; https://doi.org/10.3390/en18246438 - 9 Dec 2025
Viewed by 954
Abstract
High-voltage direct-current (HVDC) systems are essential for large-scale renewable integration and asynchronous interconnection, yet their complex topologies and multi-type faults expose the limits of threshold- and signal-based diagnostics. These methods degrade under noisy, heterogeneous measurements acquired under dynamic operating conditions, resulting in poor [...] Read more.
High-voltage direct-current (HVDC) systems are essential for large-scale renewable integration and asynchronous interconnection, yet their complex topologies and multi-type faults expose the limits of threshold- and signal-based diagnostics. These methods degrade under noisy, heterogeneous measurements acquired under dynamic operating conditions, resulting in poor adaptability, reduced accuracy, and high latency. To overcome these shortcomings, the synergistic use of knowledge graphs (KGs) and pre-trained models (PTMs) is emerging as a next-generation paradigm. KGs encode equipment parameters, protection logic, and fault propagation paths in an explicit, human-readable structure, while PTMs provide transferable representations that remain effective under label scarcity and data diversity. Coupled within a perception–cognition–decision loop, PTMs first extract latent fault signatures from multi-modal records; KGs then enable interpretable causal inference, yielding both precise localization and transparent explanations. This work systematically reviews the theoretical foundations, fusion strategies, and implementation pipelines of KG-PTM frameworks tailored to HVDC systems, benchmarking them against traditional diagnostic schemes. The paradigm demonstrates superior noise robustness, few-shot generalization, and decision explainability. However, open challenges remain, such as automated, conflict-free knowledge updating; principled integration of electro-magnetic physical constraints; real-time, resource-constrained deployment; and quantifiable trustworthiness. Future research should therefore advance autonomous knowledge engineering, physics-informed pre-training, lightweight model compression, and standardized evaluation platforms to translate KG-PTM prototypes into dependable industrial tools for intelligent HVDC operation and maintenance. Full article
(This article belongs to the Special Issue Energy, Electrical and Power Engineering: 5th Edition)
Show Figures

Figure 1

24 pages, 3486 KB  
Article
Zero-Shot Industrial Anomaly Detection via CLIP-DINOv2 Multimodal Fusion and Stabilized Attention Pooling
by Junjie Jiang, Zongxiang He, Anping Wan, Khalil AL-Bukhaiti, Kaiyang Wang, Peiyi Zhu and Xiaomin Cheng
Electronics 2025, 14(24), 4785; https://doi.org/10.3390/electronics14244785 - 5 Dec 2025
Cited by 1 | Viewed by 3722
Abstract
Industrial visual inspection demands high-precision anomaly detection amid scarce annotations and unseen defects. This paper introduces a zero-shot framework leveraging multimodal feature fusion and stabilized attention pooling. CLIP’s global semantic embeddings are hierarchically aligned with DINOv2’s multi-scale structural features via a Dual-Modality Attention [...] Read more.
Industrial visual inspection demands high-precision anomaly detection amid scarce annotations and unseen defects. This paper introduces a zero-shot framework leveraging multimodal feature fusion and stabilized attention pooling. CLIP’s global semantic embeddings are hierarchically aligned with DINOv2’s multi-scale structural features via a Dual-Modality Attention (DMA) mechanism, enabling effective cross-modal knowledge transfer for capturing macro- and micro-anomalies. A Stabilized Attention-based Pooling (SAP) module adaptively aggregates discriminative representations using self-generated anomaly heatmaps, enhancing localization accuracy and mitigating feature dilution. Trained solely in auxiliary datasets with multi-task segmentation and contrastive losses, the approach requires no target-domain samples. Extensive evaluation across seven benchmarks (MVTec AD, VisA, BTAD, MPDD, KSDD, DAGM, DTD-Synthetic) demonstrates state-of-the-art performance, achieving 93.4% image-level AUROC, 94.3% AP, 96.9% pixel-level AUROC, and 92.4% AUPRO on average. Ablation studies confirm the efficacy of DMA and SAP, while qualitative results highlight superior boundary precision and noise suppression. The framework offers a scalable, annotation-efficient solution for real-world industrial anomaly detection. Full article
Show Figures

Figure 1

17 pages, 2494 KB  
Article
Adaptive Contrastive Metric Network with Background Suppression for Few-Shot SAR Target Recognition
by Rui Cai, Chao Huang, Feng Yu and Jingcheng Zhao
Electronics 2025, 14(23), 4684; https://doi.org/10.3390/electronics14234684 - 27 Nov 2025
Viewed by 413
Abstract
Deep learning-based synthetic aperture radar (SAR) target recognition often suffers from overfitting under few-shot conditions, making it difficult to fully exploit the discriminative features contained in limited samples. Moreover, SAR targets frequently exhibit highly similar background scattering patterns, which further increase intra-class variations [...] Read more.
Deep learning-based synthetic aperture radar (SAR) target recognition often suffers from overfitting under few-shot conditions, making it difficult to fully exploit the discriminative features contained in limited samples. Moreover, SAR targets frequently exhibit highly similar background scattering patterns, which further increase intra-class variations and reduce inter-class separability, thereby constraining the performance of few-shot recognition. To address these challenges, this paper proposes an adaptive contrastive metric (ACM) network with background suppression for few-shot SAR target recognition. Specifically, a spatial squeeze-and-excitation (SSE) attention module is introduced to adaptively highlight salient scattering structures of the target while effectively suppressing noise and irrelevant background interference, thus enhancing the robustness of feature representation. In addition, an ACM module is designed, where query samples are compared not only with their corresponding support class but also with the remaining classes. This enables explicit suppression of confusing background features and enlarges inter-class margins, thereby improving the discriminability of the learned feature space. The experimental results on publicly available SAR target recognition datasets demonstrate that the proposed method achieves significant improvements in background suppression and consistently outperforms several state-of-the-art metric-based few-shot learning approaches, validating the effectiveness and generalizability of the proposed framework. Full article
(This article belongs to the Special Issue Innovative Technologies and Services for Unmanned Aerial Vehicles)
Show Figures

Figure 1

32 pages, 2623 KB  
Article
Physics-Guided Self-Supervised Few-Shot Learning for Ultrasonic Defect Detection in Concrete Structures
by Mehmet Esen Eren
Buildings 2025, 15(23), 4227; https://doi.org/10.3390/buildings15234227 - 23 Nov 2025
Cited by 2 | Viewed by 1197
Abstract
This study introduces a physics-guided self-supervised framework for few-shot ultrasonic defect detection in concrete structures, addressing the dual challenges of scarce labels and domain variability in structural health monitoring (SHM). Our method integrates physics-informed augmentations, contrastive representation learning, and adversarial domain alignment within [...] Read more.
This study introduces a physics-guided self-supervised framework for few-shot ultrasonic defect detection in concrete structures, addressing the dual challenges of scarce labels and domain variability in structural health monitoring (SHM). Our method integrates physics-informed augmentations, contrastive representation learning, and adversarial domain alignment within a mutually reinforcing cycle, enabling robust defect classification with minimal supervision. A Physics-Informed Augmentation Module synthesizes realistic ultrasonic signals, training a Transformer encoder to extract invariant features while suppressing sensor noise. An Adversarial Feature Aligner further improves cross-domain generalization by mitigating distribution shifts across heterogeneous concretes. Experimental validation on three benchmark datasets demonstrates 63–66% accuracy in one-shot cross-domain tasks and up to 89% in five-shot settings. These results represent 12–15 percentage point gains over modern few-shot baselines, with improvements statistically significant at p < 0.001. Compatible with existing ultrasonic hardware, the proposed framework bridges physics-based modeling and machine learning while paving the way for scalable, field-ready SHM solutions for aging infrastructure and resilient smart cities. Full article
(This article belongs to the Special Issue Structural Health Monitoring Through Advanced Artificial Intelligence)
Show Figures

Figure 1

23 pages, 8644 KB  
Article
Understanding What the Brain Sees: Semantic Recognition from EEG Responses to Visual Stimuli Using Transformer
by Ahmed Fares
AI 2025, 6(11), 288; https://doi.org/10.3390/ai6110288 - 7 Nov 2025
Cited by 1 | Viewed by 2099
Abstract
Understanding how the human brain processes and interprets multimedia content represents a frontier challenge in neuroscience and artificial intelligence. This study introduces a novel approach to decode semantic information from electroencephalogram (EEG) signals recorded during visual stimulus perception. We present DCT-ViT, a spatial–temporal [...] Read more.
Understanding how the human brain processes and interprets multimedia content represents a frontier challenge in neuroscience and artificial intelligence. This study introduces a novel approach to decode semantic information from electroencephalogram (EEG) signals recorded during visual stimulus perception. We present DCT-ViT, a spatial–temporal transformer architecture that pioneers automated semantic recognition from brain activity patterns, advancing beyond conventional brain state classification to interpret higher level cognitive understanding. Our methodology addresses three fundamental innovations: First, we develop a topology-preserving 2D electrode mapping that, combined with temporal indexing, generates 3D spatial–temporal representations capturing both anatomical relationships and dynamic neural correlations. Second, we integrate discrete cosine transform (DCT) embeddings with standard patch and positional embeddings in the transformer architecture, enabling frequency-domain analysis that quantifies activation variability across spectral bands and enhances attention mechanisms. Third, we introduce the Semantics-EEG dataset comprising ten semantic categories extracted from visual stimuli, providing a benchmark for brain-perceived semantic recognition research. The proposed DCT-ViT model achieves 72.28% recognition accuracy on Semantics-EEG, substantially outperforming LSTM-based and attention-augmented recurrent baselines. Ablation studies demonstrate that DCT embeddings contribute meaningfully to model performance, validating their effectiveness in capturing frequency-specific neural signatures. Interpretability analyses reveal neurobiologically plausible attention patterns, with visual semantics activating occipital–parietal regions and abstract concepts engaging frontal–temporal networks, consistent with established cognitive neuroscience models. To address systematic misclassification between perceptually similar categories, we develop a hierarchical classification framework with boundary refinement mechanisms. This approach substantially reduces confusion between overlapping semantic categories, elevating overall accuracy to 76.15%. Robustness evaluations demonstrate superior noise resilience, effective cross-subject generalization, and few-shot transfer capabilities to novel categories. This work establishes the technical foundation for brain–computer interfaces capable of decoding semantic understanding, with implications for assistive technologies, cognitive assessment, and human–AI interaction. Both the Semantics-EEG dataset and DCT-ViT implementation are publicly released to facilitate reproducibility and advance research in neural semantic decoding. Full article
(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)
Show Figures

Figure 1

17 pages, 550 KB  
Article
AnomalyNLP: Noisy-Label Prompt Learning for Few-Shot Industrial Anomaly Detection
by Li Hua and Jin Qian
Electronics 2025, 14(20), 4016; https://doi.org/10.3390/electronics14204016 - 13 Oct 2025
Viewed by 2151
Abstract
Few-Shot Industrial Anomaly Detection (FSIAD) is an essential yet challenging problem in practical scenarios such as industrial quality inspection. Its objective is to identify previously unseen anomalous regions using only a limited number of normal support images from the same category. Recently, large [...] Read more.
Few-Shot Industrial Anomaly Detection (FSIAD) is an essential yet challenging problem in practical scenarios such as industrial quality inspection. Its objective is to identify previously unseen anomalous regions using only a limited number of normal support images from the same category. Recently, large pre-trained vision-language models (VLMs), such as CLIP, have exhibited remarkable few-shot image-text representation abilities across a range of visual tasks, including anomaly detection. Despite their promise, real-world industrial anomaly datasets often contain noisy labels, which can degrade prompt learning and detection performance. In this paper, we propose AnomalyNLP, a new Noisy-Label Prompt Learning approach designed to tackle the challenge of few-shot anomaly detection. This framework offers a simple and efficient approach that leverages the expressive representations and precise alignment capabilities of VLMs for industrial anomaly detection. First, we design a Noisy-Label Prompt Learning (NLPL) strategy. This strategy utilizes feature learning principles to suppress the influence of noisy samples via Mean Absolute Error (MAE) loss, thereby improving the signal-to-noise ratio and enhancing overall model robustness. Furthermore, we introduce a prompt-driven optimal transport feature purification method to accurately partition datasets into clean and noisy subsets. For both image-level and pixel-level anomaly detection, AnomalyNLP achieves state-of-the-art performance across various few-shot settings on the MVTecAD and VisA public datasets. Qualitative and quantitative results on two datasets demonstrate that our method achieves the largest average AUC improvement over baseline methods across 1-, 2-, and 4-shot settings, with gains of up to 10.60%, 10.11%, and 9.55% in practical anomaly detection scenarios. Full article
Show Figures

Figure 1

20 pages, 3126 KB  
Article
Few-Shot Image Classification Algorithm Based on Global–Local Feature Fusion
by Lei Zhang, Xinyu Yang, Xiyuan Cheng, Wenbin Cheng and Yiting Lin
AI 2025, 6(10), 265; https://doi.org/10.3390/ai6100265 - 9 Oct 2025
Cited by 11 | Viewed by 2728
Abstract
Few-shot image classification seeks to recognize novel categories from only a handful of labeled examples, but conventional metric-based methods that rely mainly on global image features often produce unstable prototypes under extreme data scarcity, while local-descriptor approaches can lose context and suffer from [...] Read more.
Few-shot image classification seeks to recognize novel categories from only a handful of labeled examples, but conventional metric-based methods that rely mainly on global image features often produce unstable prototypes under extreme data scarcity, while local-descriptor approaches can lose context and suffer from inter-class local-pattern overlap. To address these limitations, we propose a Global–Local Feature Fusion network that combines a frozen, pretrained global feature branch with a self-attention based multi-local feature fusion branch. Multiple random crops are encoded by a shared backbone (ResNet-12), projected to Query/Key/Value embeddings, and fused via scaled dot-product self-attention to suppress background noise and highlight discriminative local cues. The fused local representation is concatenated with the global feature to form robust class prototypes used in a prototypical-network style classifier. On four benchmarks, our method achieves strong improvements: Mini-ImageNet 70.31% ± 0.20 (1-shot)/85.91% ± 0.13 (5-shot), Tiered-ImageNet 73.37% ± 0.22/87.62% ± 0.14, FC-100 47.01% ± 0.20/64.13% ± 0.19, and CUB-200-2011 82.80% ± 0.18/93.19% ± 0.09, demonstrating consistent gains over competitive baselines. Ablation studies show that (1) naive local averaging improves over global-only baselines, (2) self-attention fusion yields a large additional gain (e.g., +4.50% in 1-shot on Mini-ImageNet), and (3) concatenating global and fused local features gives the best overall performance. These results indicate that explicitly modeling inter-patch relations and fusing multi-granularity cues produces markedly more discriminative prototypes in few-shot regimes. Full article
Show Figures

Figure 1

21 pages, 9052 KB  
Article
SAM–Attention Synergistic Enhancement: SAR Image Object Detection Method Based on Visual Large Model
by Yirong Yuan, Jie Yang, Lei Shi and Lingli Zhao
Remote Sens. 2025, 17(19), 3311; https://doi.org/10.3390/rs17193311 - 26 Sep 2025
Cited by 1 | Viewed by 1891
Abstract
The object detection model for synthetic aperture radar (SAR) images needs to have strong generalization ability and more stable detection performance due to the complex scattering mechanism, high sensitivity of the orientation angle, and susceptibility to speckle noise. Visual large models possess strong [...] Read more.
The object detection model for synthetic aperture radar (SAR) images needs to have strong generalization ability and more stable detection performance due to the complex scattering mechanism, high sensitivity of the orientation angle, and susceptibility to speckle noise. Visual large models possess strong generalization capabilities for natural image processing, but their application to SAR imagery remains relatively rare. This paper attempts to introduce a visual large model into the SAR object detection task, aiming to alleviate the problems of weak cross-domain generalization and poor adaptability to few-shot samples caused by the characteristics of SAR images in existing models. The proposed model comprises an image encoder, an attention module, and a detection decoder. The image encoder leverages the pre-trained Segment Anything Model (SAM) for effective feature extraction from SAR images. An Adaptive Channel Interactive Attention (ACIA) module is introduced to suppress SAR speckle noise. Further, a Dynamic Tandem Attention (DTA) mechanism is proposed in the decoder to integrate scale perception, spatial focusing, and task adaptation, while decoupling classification from detection for improved accuracy. Leveraging the strong representational and few-shot adaptation capabilities of large pre-trained models, this study evaluates their cross-domain and few-shot detection performance on SAR imagery. For cross-domain detection, the model was trained on AIR-SARShip-1.0 and tested on SSDD, achieving an mAP50 of 0.54. For few-shot detection on SAR-AIRcraft-1.0, using only 10% of the training samples, the model reached an mAP50 of 0.503. Full article
(This article belongs to the Special Issue Big Data Era: AI Technology for SAR and PolSAR Image)
Show Figures

Figure 1

Back to TopTop