MDPI - Publisher of Open Access Journals

18 pages, 1085 KB

Open AccessArticle

Self-Learning Multimodal Emotion Recognition Based on Multi-Scale Dilated Attention

by Xiuli Du and Luyao Zhu

Brain Sci. 2026, 16(4), 350; https://doi.org/10.3390/brainsci16040350 - 25 Mar 2026

Viewed by 304

Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance [...] Read more.

Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance by effectively integrating electroencephalogram (EEG) signals and facial expressions through a multimodal framework. Methods: We propose a multimodal emotion recognition model that employs a Multi-Scale Dilated Attention Convolution (MSDAC) network tailored for facial expression recognition, integrates an EEG emotion recognition method based on three-dimensional features, and adopts a self-learning decision-level fusion strategy. MSDAC incorporates Multi-Scale Dilated Convolutions and a Dual-Branch Attention (D-BA) module to capture discontinuous facial action units. For EEG processing, raw signals are converted into a multidimensional time–frequency–spatial representation to preserve temporal, spectral, and spatial information. To overcome the limitations of traditional stitching or fixed-weight fusion approaches, a self-learning weight fusion mechanism is introduced at the decision level to adaptively adjust modality contributions. Results: The facial analysis branch achieved average accuracies of 74.1% on FER2013, 99.69% on CK+, and 98.05% (valence)/96.15% (arousal) on DEAP. On the DEAP dataset, the complete multimodal model reached 98.66% accuracy for valence and 97.49% for arousal classification. Conclusions: The proposed framework enhances emotion recognition by improving facial feature extraction and enabling adaptive multimodal fusion, demonstrating the effectiveness of combining EEG and facial information for robust emotion analysis. Full article

(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

► Show Figures

Graphical abstract

24 pages, 1460 KB

Open AccessPerspective

From Sensing to Sense-Making: A Framework for On-Person Intelligence with Wearable Biosensors and Edge LLMs

by Tad T. Brunyé, Mitchell V. Petrimoulx and Julie A. Cantelon

Sensors 2026, 26(7), 2034; https://doi.org/10.3390/s26072034 - 25 Mar 2026

Viewed by 509

Abstract

Wearable biosensors increasingly stream multi-channel physiological and behavioral data outside the laboratory, yet most deployments still end in dashboards or threshold alarms that leave interpretation open to the user. In high-stakes domains, such as military, emergency response, aviation, industry, and elite sport, the [...] Read more.

Wearable biosensors increasingly stream multi-channel physiological and behavioral data outside the laboratory, yet most deployments still end in dashboards or threshold alarms that leave interpretation open to the user. In high-stakes domains, such as military, emergency response, aviation, industry, and elite sport, the constraint is rarely data availability but the cognitive effort required to convert noisy signals into timely, actionable decisions. We argue for on-person cognitive co-pilots: systems that integrate multimodal sensing, compute probabilistic state estimates on devices, synthesize those states with task and environmental context using locally hosted large language models (LLMs), and deliver recommendations through attention-appropriate cues that preserve autonomy. Enabling conditions include mature wearable sensing, edge artificial intelligence (AI) accelerators, tiny machine learning (TinyML) pipelines, privacy-preserving learning, and open-weight LLMs capable of local deployment with retrieval and guardrails. However, critical research gaps remain across layers: sensor validity under real-world conditions, uncertainty calibration and fusion under distribution shift, verification of LLM-mediated reasoning, interaction design that avoids alarm fatigue and automation bias, and governance models that protect privacy and consent in constrained settings. We propose a layered technical framework and research agenda grounded in cognitive engineering and human–automation interaction. Our core claim is that local, uncertainty-aware reasoning is an architectural prerequisite for trustworthy, low-latency augmentation in isolated, confined, and extreme environments. Full article

(This article belongs to the Special Issue Sensors in 2026)

► Show Figures

Figure 1

20 pages, 37476 KB

Open AccessArticle

In-Orbit MapAnything: An Enhanced Feed-Forward Metric Framework for 3D Reconstruction of Non-Cooperative Space Targets Under Complex Lighting

by Yinxi Lu, Hongyuan Wang, Qianhao Ning, Ziyang Liu, Yunzhao Zang, Zhen Liao and Zhiqiang Yan

Sensors 2026, 26(7), 2026; https://doi.org/10.3390/s26072026 - 24 Mar 2026

Viewed by 332

Abstract

Precise 3D reconstruction of non-cooperative space targets is a prerequisite for active debris removal and on-orbit servicing. However, this task is impeded by severe environmental challenges. Specifically, the limited dynamic range of visible light cameras leads to frequent overexposure or underexposure under extreme [...] Read more.

Precise 3D reconstruction of non-cooperative space targets is a prerequisite for active debris removal and on-orbit servicing. However, this task is impeded by severe environmental challenges. Specifically, the limited dynamic range of visible light cameras leads to frequent overexposure or underexposure under extreme space lighting. Compounded by sparse textures and strong specular reflections, these factors significantly constrain reconstruction accuracy. While existing general-purpose feed-forward models such as MapAnything offer efficient inference, their geometric recovery capabilities degrade sharply when facing significant domain shifts. To address these issues, this paper proposes an enhanced 3D reconstruction framework tailored for the space environment named In-Orbit MapAnything. First, to mitigate data scarcity, we construct a high-quality space target dataset incorporating extreme illumination characteristics, which provides comprehensive auxiliary modalities including accurate camera poses and dense point clouds. Second, we propose the SatMap-Adapter module to mitigate feature degradation caused by severe specular reflections. This architecture employs a hierarchical cascade sampling strategy to align multi-level backbone features and utilizes a lightweight adaptive fusion module to dynamically integrate shallow photometric cues, intermediate structural information, and deep semantic features. Finally, we employ a weight-decomposed low-rank adaptation strategy to achieve parameter-efficient fine-tuning while strictly freezing the pre-trained backbone. Experimental results demonstrate that the proposed method decreases the absolute relative error and Chamfer distance by 15.23% and 20.02% respectively compared to the baseline MapAnything model, while maintaining a rapid inference speed. The proposed approach effectively suppresses reconstruction noise on metallic surfaces and recovers fine geometric structures, validating the effectiveness of our feature-enhanced framework in extreme space environments. Full article

(This article belongs to the Special Issue Advanced Imaging Sensors for Object-Shape Recognition and 3D Reconstruction)

► Show Figures

Figure 1

28 pages, 502 KB

Open AccessArticle

Emotional Framing in Prompts Modulates Large Language Model Performance

by Manuel Gozzi and Francesca Fallucchi

Big Data Cogn. Comput. 2026, 10(4), 102; https://doi.org/10.3390/bdcc10040102 - 24 Mar 2026

Viewed by 494

Abstract

Large Language Models (LLMs) demonstrate remarkable performance across a variety of natural language understanding tasks, yet their sensitivity to emotional framing in user prompts remains underexplored. This paper presents an empirical study investigating how four emotional tones—joy, apathy, anger, and fear—affect LLM performance [...] Read more.

Large Language Models (LLMs) demonstrate remarkable performance across a variety of natural language understanding tasks, yet their sensitivity to emotional framing in user prompts remains underexplored. This paper presents an empirical study investigating how four emotional tones—joy, apathy, anger, and fear—affect LLM performance on the SuperGLUE benchmark. We evaluate five instruction-tuned, open-weight models across eight diverse tasks, systematically modulating input prompts with affective cues while keeping semantic content constant. Results reveal that prompts framed with joy and apathy lead to consistently higher accuracy, with gains of up to 4.5 percentage points compared to fear-framed inputs, which yield the lowest performance. These findings demonstrate that affective modulation in user prompts measurably impacts LLM reasoning and task outcomes, suggesting that emotional framing is not merely stylistic but functionally relevant to model behavior. Our study provides a reproducible experimental framework and an open-source prompt set, offering a foundation for future research on affect-aware prompting strategies and their implications in human–AI interaction. Full article

(This article belongs to the Special Issue Advances in Large Language Models for Biological and Medical Applications)

► Show Figures

Figure 1

19 pages, 13660 KB

Open AccessArticle

CA-GFNet: A Cross-Modal Adaptive Gated Fusion Network for Facial Emotion Recognition

by Sitara Afzal and Jong-Ha Lee

Mathematics 2026, 14(6), 1068; https://doi.org/10.3390/math14061068 - 21 Mar 2026

Viewed by 236

Abstract

Facial emotion recognition (FER) plays an important role in healthcare, human–computer interaction, and intelligent security systems. However, despite recent advances, many state-of-the-art FER methods depend on computationally intensive CNN or transformer backbones and large-scale annotated datasets while suffering noticeable performance degradation under cross-dataset [...] Read more.

Facial emotion recognition (FER) plays an important role in healthcare, human–computer interaction, and intelligent security systems. However, despite recent advances, many state-of-the-art FER methods depend on computationally intensive CNN or transformer backbones and large-scale annotated datasets while suffering noticeable performance degradation under cross-dataset evaluation because of domain shift. These limitations hinder practical usage in resource-constrained and real-world environments. To address this issue, we propose Cross-Adaptive Gated Fusion Network (CA-GFNet), a lightweight dual-stream FER framework that explicitly combines shallow structural features with deep semantic representations. The proposed architecture integrates domain-robust gradient-based descriptors with compact deep features extracted from a VGG-based backbone. After face detection and normalization, the structural stream captures fine-grained local appearance cues, whereas the semantic stream encodes high-level facial configurations. The two feature streams are projected into a shared latent space and adaptively fused using a gated fusion mechanism that learns sample-specific weights, allowing the model to prioritize the more reliable feature source under dataset shift. Extensive experiments on KDEF along with zero-shot cross-dataset evaluation on CK+ using a strict train-on-KDEF/test-on-CK+ protocol with subject-independent splits demonstrate the effectiveness of the proposed method. CA-GFNet achieves 99.30% accuracy on KDEF and 98.98% on CK+ while requiring significantly fewer parameters than conventional deep FER models. These results confirm that adaptive gated fusion of shallow and deep features can deliver both high recognition accuracy and strong cross-dataset robustness. Full article

(This article belongs to the Special Issue Advanced Algorithms in Multimodal Affective Computing)

► Show Figures

Figure 1

23 pages, 13051 KB

Open AccessArticle

BAWSeg: A UAV Multispectral Benchmark for Barley Weed Segmentation

by Haitian Wang, Xinyu Wang, Muhammad Ibrahim, Dustin Severtson and Ajmal Mian

Remote Sens. 2026, 18(6), 915; https://doi.org/10.3390/rs18060915 - 17 Mar 2026

Viewed by 277

Abstract

Accurate weed mapping in cereal fields requires pixel-level segmentation from unmanned aerial vehicle (UAV) imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop–weed pixels, or [...] Read more.

Accurate weed mapping in cereal fields requires pixel-level segmentation from unmanned aerial vehicle (UAV) imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop–weed pixels, or on single-stream convolutional neural network (CNN) and Transformer backbones that ingest stacked bands and indices, where radiance cues and normalized index cues interfere and reduce sensitivity to small weed clusters embedded in crop canopy. We propose VISA (Vegetation Index and Spectral Attention), a two-stream segmentation network that decouples these cues and fuses them at native resolution. The radiance stream learns from calibrated five-band reflectance using local residual convolutions, channel recalibration, spatial gating, and skip-connected decoding, which preserve fine textures, row boundaries, and small weed structures that are often weakened after ratio-based index compression. The index stream operates on vegetation-index maps with windowed self-attention to model local structure efficiently, state-space layers to propagate field-scale context without quadratic attention cost, and Slot Attention to form stable region descriptors that improve discrimination of sparse weeds under canopy mixing. To support supervised training and deployment-oriented evaluation, we introduce BAWSeg, a four-year UAV multispectral dataset collected over commercial barley paddocks in Western Australia, providing radiometrically calibrated blue, green, red, red edge, and near-infrared orthomosaics, derived vegetation indices, and dense crop, weed, and other labels with leakage-free block splits. On BAWSeg, VISA achieves 75.6% mean Intersection over Union (mIoU) and 63.5% weed Intersection over Union (IoU) with 22.8 M parameters, outperforming a multispectral SegFormer-B1 baseline by 1.2 mIoU and 1.9 weed IoU. Under cross-plot and cross-year protocols, VISA maintains 71.2% and 69.2% mIoU, respectively. The full BAWSeg benchmark dataset, VISA code, trained model weights, and protocol files will be released upon publication. Full article

(This article belongs to the Special Issue Intelligent UAV Remote Sensing for Next-Generation Precision Agriculture)

► Show Figures

Figure 1

19 pages, 6716 KB

Open AccessArticle

Multi-Type Weld Defect Detection in Galvanized Sheet MIG Welding Using an Improved YOLOv10 Model

by Bangzhi Xiao, Yadong Yang, Yinshui He and Guohong Ma

Materials 2026, 19(6), 1178; https://doi.org/10.3390/ma19061178 - 17 Mar 2026

Viewed by 319

Abstract

Shop-floor weld inspection may appear to be a solved problem until a camera is deployed near a galvanized-sheet MIG welding line. The seam reflects light, the texture changes from frame to frame, and the defects of interest are often small and visually subtle. [...] Read more.

Shop-floor weld inspection may appear to be a solved problem until a camera is deployed near a galvanized-sheet MIG welding line. The seam reflects light, the texture changes from frame to frame, and the defects of interest are often small and visually subtle. Additionally, the hardware near the line is rarely a data-center GPU. With those constraints in mind, this paper presents YOLO-MIG, a compact detector built on YOLOv10n for weld-seam inspection in practical production conditions. We make three focused changes to the baseline: a C2f-EMSCP backbone block to better preserve weak defect cues with modest parameter growth, a BiFPN neck to keep small-target information alive during feature fusion, and a C2fCIB head to clean up predictions that otherwise get distracted by seam edges and illumination artifacts. On a workshop-collected dataset containing 326 original images, with the training subset expanded through augmentation to 2608 labeled samples in total, YOLO-MIG achieves 98.4% mAP@0.5 and 56.29% mAP@0.5:0.95 on the test set while remaining lightweight (1.83 M parameters, 3.87 MB FP16 weights). Compared with YOLOv10n, the proposed model improves mAP@0.5 by 9.36 points and mAP@0.5:0.95 by 4.89 points, while reducing parameters, GFLOPs, and model size by 43.4%, 19.9%, and 29.9%, respectively. The results suggest that YOLO-MIG is not only accurate but also realistic to deploy at the edge for intelligent weld quality control. Full article

(This article belongs to the Section Manufacturing Processes and Systems)

► Show Figures

Figure 1

23 pages, 2010 KB

Open AccessArticle

Visibility-Prior Guided Dual-Stream Mixture-of-Experts for Robust Facial Expression Recognition Under Complex Occlusions

by Siyuan Ma, Long Liu, Mingzhi Cheng, Peijun Qin, Zixuan Han, Cui Chen, Shizhao Yang and Hongjuan Wang

Electronics 2026, 15(6), 1230; https://doi.org/10.3390/electronics15061230 - 16 Mar 2026

Viewed by 294

Abstract

Facial occlusion induces sample-wise reliability shifts in facial expression recognition (FER), where the usefulness of global context and local discriminative cues varies dramatically with the amount of visible facial information. Existing occlusion-robust FER studies often evaluate under limited or homogeneous occlusion settings and [...] Read more.

Facial occlusion induces sample-wise reliability shifts in facial expression recognition (FER), where the usefulness of global context and local discriminative cues varies dramatically with the amount of visible facial information. Existing occlusion-robust FER studies often evaluate under limited or homogeneous occlusion settings and commonly adopt static fusion strategies, which are insufficient for complex and heterogeneous real-world occlusions. In this work, we establish a rigorous occlusion robustness evaluation protocol by constructing a fixed offline test benchmark with diverse synthetic occlusion patterns (e.g., masks, sunglasses, texture blocks, and mixed occlusions) on top of public FER test splits. We further propose a Dual-Stream Adaptive Weighting Mixture-of-Experts framework (DS-AW-MoE) that fuses a global contextual expert and a local discriminative expert via an occlusion-aware weighting network. Crucially, we introduce a facial visibility assessment as a task-agnostic prior to explicitly regulate expert contributions, enabling dynamic re-allocation of model capacity according to input-dependent feature reliability. Extensive experiments on public datasets and the constructed occlusion benchmark demonstrate that DS-AW-MoE achieves more stable recognition under complex occlusions, characterized by a smaller and more consistent performance drop. To support reproducibility under dataset license constraints, we will release an anonymous, fully runnable repository containing the complete occlusion synthesis pipeline, evaluation protocol, and configuration files, allowing researchers to reproduce the benchmark after obtaining the original datasets. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

28 pages, 2882 KB

Open AccessArticle

Semantic Divergence in AI-Generated and Human Influencer Product Recommendations: A Computational Analysis of Dual-Agent Communication in Social Commerce

by Woo-Chul Lee, Jang-Suk Lee and Jungho Suh

Appl. Sci. 2026, 16(6), 2816; https://doi.org/10.3390/app16062816 - 15 Mar 2026

Viewed by 448

Abstract

The proliferation of generative artificial intelligence (AI) as an autonomous recommendation agent fundamentally challenges traditional paradigms of marketing communication. As AI systems increasingly mediate consumer–brand relationships, understanding how artificial agents construct persuasive discourse—distinct from human communicators—becomes critical for developing effective dual-channel marketing strategies. [...] Read more.

The proliferation of generative artificial intelligence (AI) as an autonomous recommendation agent fundamentally challenges traditional paradigms of marketing communication. As AI systems increasingly mediate consumer–brand relationships, understanding how artificial agents construct persuasive discourse—distinct from human communicators—becomes critical for developing effective dual-channel marketing strategies. Grounded in Source Credibility Theory and the Computers Are Social Actors (CASA) paradigm, this study investigates the semantic and structural divergence between AI-generated product recommendations and human influencer marketing messages in social commerce contexts. Employing a mixed-methods computational approach integrating term frequency analysis, TF-IDF weighting, Latent Dirichlet Allocation (LDA) topic modeling, and BERT-based contextualized semantic embedding analysis (KR-SBERT), we examined 330 Instagram influencer posts and 541 AI-generated responses concerning inner beauty enzyme products—a hybrid category combining functional health claims with hedonic beauty appeals—in the Korean social commerce market. AI-generated responses were collected through a systematically designed query protocol with empirically grounded prompts derived from actual consumer search behaviors, and analytical robustness was verified through sensitivity analyses across multiple parameter thresholds. Our findings reveal a fundamental divergence in persuasive architecture: human influencers construct experiential narratives exhibiting message characteristics typically associated with peripheral-route cues (sensory descriptions, emotional testimonials, social context), while AI recommendations employ systematic, evidence-based discourse exhibiting message characteristics typically associated with central-route argumentation (functional mechanisms, ingredient specifications, objective criteria). Topic modeling identified four distinct thematic clusters for each source type: human discourse centers on embodied experience and relational consumption, whereas AI discourse organizes around informational utility and rational decision support. Jensen–Shannon Divergence analysis (JSD = 0.213 bits) confirmed moderate distributional divergence, while chi-square testing (χ² = 847.23, p < 0.001) and Cramér’s V (0.312, indicating a medium-to-large effect) demonstrated statistically significant and substantively meaningful differences. These findings extend CASA theory by demonstrating that AI recommendation agents develop a characteristic “AI communication signature” distinguishable from human persuasion patterns. We propose an integrated Dual-Agent Persuasion Proposition—synthesizing CASA, ELM, and Source Credibility perspectives—suggesting that AI and human recommenders serve complementary functions across different stages of the consumer decision journey—a proposition whose predictions regarding sequential persuasive effectiveness and consumer processing routes await experimental validation. These findings carry implications for AI content strategy optimization, platform design, and emerging regulatory frameworks for AI-generated content labeling. Full article

► Show Figures

Figure 1

21 pages, 23671 KB

Open AccessArticle

Zero-Shot Polarization-Intensity Physical Fusion Monocular Depth Estimation for High Dynamic Range Scenes

by Renhao Rao, Zhizhao Ouyang, Shuang Chen, Liang Chen, Guoqin Huang and Changcai Cui

Photonics 2026, 13(3), 268; https://doi.org/10.3390/photonics13030268 - 11 Mar 2026

Viewed by 356

Abstract

Monocular 3D reconstruction remains a persistent challenge for autonomous driving systems in Degraded Visual Environments (DVEs) with extreme glare and low illumination, such as highway tunnels, due to the lack of reliable texture cues. This paper proposes a physics-aware deep learning framework that [...] Read more.

Monocular 3D reconstruction remains a persistent challenge for autonomous driving systems in Degraded Visual Environments (DVEs) with extreme glare and low illumination, such as highway tunnels, due to the lack of reliable texture cues. This paper proposes a physics-aware deep learning framework that overcomes these limitations by fusing polarization sensing with conventional intensity imaging. Unlike traditional end-to-end data-driven fusion strategies, we propose a Modality-Aligned Parameter Injectionstrategy. By remapping the weight space of the input layer, this strategy achieves a smooth transfer of the pre-trained Vision Transformer (i.e., MiDaS) to multi-modal inputs. Its core advantage lies in the seamless integration of four-channel polarization geometric information while fully preserving the pre-trained semantic representation capabilities of the backbone network, thereby avoiding the overfitting risk associated with training from scratch on small-sample data. Furthermore, we design a Reliability-Aware Gating mechanism that dynamically re-weights appearance and geometric cues based on intensity saturation and the physical validity of polarization signals as measured by the Degree of Linear Polarization (DoLP). We validate the proposed method on our self-constructed POLAR-GLV benchmark, a real-world dataset collected specifically for high dynamic range tunnel scenarios. Extensive experiments demonstrate that our method consistently outperforms intensity-only baselines, reducing geometric reconstruction error by 24.2% in high-glare tunnel exit zones and 10.0% at tunnel entrances. Crucially, compared to multi-stream fusion architectures, these performance gains come with negligible additional computational cost, making the framework highly suitable for resource-constrained onboard inference environments. Full article

(This article belongs to the Special Issue AI for Photonics: Intelligent Imaging, Learning-Driven Optics, and Photonic Computing)

► Show Figures

Figure 1

26 pages, 3911 KB

Open AccessArticle

Integrated Multimodal Perception and Predictive Motion Forecasting via Cross-Modal Adaptive Attention

by Bakhita Salman, Alexander Chavez and Muneeb Yassin

Future Transp. 2026, 6(2), 64; https://doi.org/10.3390/futuretransp6020064 - 11 Mar 2026

Viewed by 387

Abstract

Accurate environmental perception is fundamental to safe autonomous driving; however, most existing multimodal systems rely on fixed or heuristic sensor fusion strategies that cannot adapt to scene-dependent variations in sensor reliability. This paper proposes Cross-Modal Adaptive Attention (CMAA), a unified end-to-end Bird’s-Eye-View (BEV) [...] Read more.

Accurate environmental perception is fundamental to safe autonomous driving; however, most existing multimodal systems rely on fixed or heuristic sensor fusion strategies that cannot adapt to scene-dependent variations in sensor reliability. This paper proposes Cross-Modal Adaptive Attention (CMAA), a unified end-to-end Bird’s-Eye-View (BEV) perception framework that dynamically fuses camera, LiDAR, and RADAR information through learnable, context-aware modality gating. Unlike static fusion approaches, CMAA adaptively reweights sensor contributions based on global scene descriptors, enabling the robust integration of semantic, geometric, and motion cues without manual tuning. The proposed architecture jointly performs 3D object detection, multi-object tracking, and motion forecasting within a shared BEV representation, preserving spatial alignment across tasks and supporting efficient real-time deployment. Experiments conducted on the official nuScenes validation split demonstrate that CMAA achieves 0.528 mAP and 0.691 NDS, outperforming fixed-weight fusion baselines while maintaining a compact model size and efficient inference. Additional tracking evaluation using the official nuScenes tracking devkit reports improved tracking performance, while motion forecasting experiments show reduced trajectory displacement errors (minADE and minFDE). Ablation studies further confirm the complementary contributions of adaptive modality gating and bidirectional cross-modal refinement, and a stratified dynamic analysis reveals consistent reductions in velocity estimation error across object classes, motion regimes, and environmental conditions. These results demonstrate that adaptive multimodal fusion improves robustness, motion reasoning, and perception reliability in complex traffic environments while remaining computationally efficient for deployment in safety-critical autonomous driving systems. Full article

► Show Figures

Figure 1

12 pages, 583 KB

Open AccessArticle

Development of the Citrus Longhorned Beetle Anoplophora chinensis (Cerambycidae: Coleoptera) on Artificial Diet and Chilling Effect on Their Life Cycle Completion

by Hai Nam Nguyen and Ki-Jeong Hong

Insects 2026, 17(3), 285; https://doi.org/10.3390/insects17030285 - 5 Mar 2026

Viewed by 647

Abstract

Anoplophora chinensis (Coleoptera: Cerambycidae) is an invasive, economically important, quarantined wood-boring pest whose long-life cycle complicates laboratory rearing and management. This study investigated the combined effects of artificial diet, chilling duration, and temperature on pupation cues. Adults collected from the wild were allowed [...] Read more.

Anoplophora chinensis (Coleoptera: Cerambycidae) is an invasive, economically important, quarantined wood-boring pest whose long-life cycle complicates laboratory rearing and management. This study investigated the combined effects of artificial diet, chilling duration, and temperature on pupation cues. Adults collected from the wild were allowed to oviposit, and newly hatched larvae were reared on a prepared artificial diet. Larval weight was recorded biweekly to assess growth and mortality. At 12 weeks of age, larvae were subjected to cold treatments at 5 °C or 10 °C for 9, 12, 14, 16, or 19 weeks, then returned to warm rearing conditions to monitor pupation. Additional chilling cycles were applied when necessary. Pupation percents increased with chilling duration, reaching 55% after 16 weeks at 10 °C compared with 16.7% after 12 weeks and none after 9 weeks. Developmental durations were 34.43, 55.93, and 88.65 weeks for larvae experiencing one, two, and three chilling cycles, respectively. Adults body weight was consistently lower than that of field-collected individuals for both males and females. These findings confirm that chilling is essential for pupation cues and demonstrate that both duration and temperature strongly influence pupation success. Importantly, the combination of artificial diet with optimized chilling regimes enhances pupation rates, providing a practical foundation for mass-rearing protocols of A. chinensis to support future research and management programs. Full article

(This article belongs to the Special Issue Science of Insect Rearing Dynamics: Discovery-Based Inquiry)

► Show Figures

Figure 1

26 pages, 20080 KB

Open AccessArticle

GS-USTNet: Global–Local Adaptive Convolution with Skip-Guided Attention for Remote Sensing Image Segmentation

by Haoran Qian, Xuan Liu, Zhuang Li, Yongjie Ma and Zhenyu Lu

Remote Sens. 2026, 18(5), 785; https://doi.org/10.3390/rs18050785 - 4 Mar 2026

Viewed by 324

Abstract

Semantic segmentation of remote sensing imagery is crucial for applications such as land resource management and urban planning, yet it remains challenging due to low intra-class variation, ambiguous boundaries, and the coexistence of multi-scale geospatial features. To tackle these issues, we propose GS-USTNet, [...] Read more.

Semantic segmentation of remote sensing imagery is crucial for applications such as land resource management and urban planning, yet it remains challenging due to low intra-class variation, ambiguous boundaries, and the coexistence of multi-scale geospatial features. To tackle these issues, we propose GS-USTNet, a novel architecture that enhances both feature representation and boundary recovery. First, we introduce a Global–Local Adaptive Convolution (GLAConv) module that dynamically fuses global contextual cues with local responses to generate content-aware convolutional weights, thereby improving feature discriminability. Second, we design a Skip-Guided Attention (SGA) mechanism that leverages spatial–channel joint attention to guide the decoder, effectively mitigating attention dispersion in complex scenes or under class imbalance and significantly sharpening object boundaries. Built upon the efficient USTNet framework, our model achieves substantial performance gains without compromising computational efficiency. Extensive experiments on benchmark datasets demonstrate that GS-USTNet achieves consistent improvements over the original USTNet, with gains of approximately 3.5% in overall accuracy and 6.0% in F1-score across datasets. Ablation studies further confirm the effectiveness of the proposed GLAConv and SGA modules. This work provides an efficient and robust approach for fine-grained semantic segmentation of high-resolution remote sensing imagery. Full article

► Show Figures

Figure 1

33 pages, 8140 KB

Open AccessArticle

Diagnosing Shortcut Learning in CNN-Based Photovoltaic Fault Recognition from RGB Images: A Multi-Method Explainability Audit

by Bogdan Marian Diaconu

AI 2026, 7(3), 94; https://doi.org/10.3390/ai7030094 - 4 Mar 2026

Viewed by 481

Abstract

Convolutional neural networks (CNNs) can achieve high accuracy in photovoltaic (PV) fault recognition from RGB imagery, yet their decisions may rely on shortcut cues induced by heterogeneous backgrounds, viewpoints, and class imbalance. This work presents a multi-method explainability audit on the Kaggle PV [...] Read more.

Convolutional neural networks (CNNs) can achieve high accuracy in photovoltaic (PV) fault recognition from RGB imagery, yet their decisions may rely on shortcut cues induced by heterogeneous backgrounds, viewpoints, and class imbalance. This work presents a multi-method explainability audit on the Kaggle PV Panel Defect Dataset (six classes), comparing five architectures (Baseline CNN, VGG16, ResNet50, InceptionV3, EfficientNetB0). Explanations are obtained with LIME superpixel surrogates (reported together with kernel-weighted surrogate fidelity), occlusion sensitivity (quantified via IoU@Top10% against consistent proxy masks, Shannon entropy, and Hoyer sparsity), and Integrated Gradients evaluated by deletion–insertion faithfulness and a Faithfulness Gap. While ResNet50 yields the best predictive performance, EfficientNetB0 shows the most consistent faithfulness evidence and stable panel-centered attributions. The analysis highlights class-dependent vulnerability to context cues, especially for the Clean and damaged classes, and supports using quantitative explainability diagnostics during model selection and dataset curation to mitigate shortcuts in vision-based PV monitoring. Full article

(This article belongs to the Topic AI and Data-Driven Advancements in Industry 4.0, 2nd Edition)

► Show Figures

Figure 1

27 pages, 1058 KB

Open AccessArticle

An AI-Driven Multimodal Sensor Fusion Framework for Fraud Perception in Short-Video and Live-Streaming Platforms

by Ruixiang Zhao, Xuanhao Zhang, Jinfan Yang, Haofei Li, Zhengjia Lu, Wenrui Xu and Manzhou Li

Sensors 2026, 26(5), 1525; https://doi.org/10.3390/s26051525 - 28 Feb 2026

Viewed by 485

Abstract

With the rapid proliferation of short-video platforms and live-streaming commerce ecosystems, marketing activities are increasingly manifested through complex multimodal sensing signals. These heterogeneous sensor data streams exhibit strong temporal dependency, high cross-modal coupling, and progressive evolutionary characteristics, making early-stage fraud perception particularly challenging [...] Read more.

With the rapid proliferation of short-video platforms and live-streaming commerce ecosystems, marketing activities are increasingly manifested through complex multimodal sensing signals. These heterogeneous sensor data streams exhibit strong temporal dependency, high cross-modal coupling, and progressive evolutionary characteristics, making early-stage fraud perception particularly challenging for conventional unimodal or static analytical paradigms. Existing approaches often fail to effectively capture weak anomalous cues emerging across multimodal channels during the initial stages of fraudulent campaigns. To address these limitations, an artificial intelligence-driven multimodal sensor perception framework is proposed for temporal fraud detection in short-video environments. A multimodal temporal alignment module is first designed to synchronize heterogeneous sensor signals with inconsistent sampling granularities. Subsequently, a shared temporal encoding network is constructed to learn evolution-aware representations across multimodal sensor sequences. On this basis, a cross-modal temporal attention fusion mechanism is introduced to dynamically weight sensor contributions at different behavioral stages. Finally, a fraud evolution modeling and early risk prediction module is developed to characterize the progressive intensification of fraudulent activities and to enable risk assessment under incomplete temporal observations. Extensive experiments conducted on real-world datasets collected from multiple mainstream short-video platforms demonstrate the effectiveness of the proposed AI-driven sensing framework. The model achieves an overall accuracy of 0.941, precision of 0.865, recall of 0.812, and F1 score of 0.838, with the AUC further reaching 0.956, significantly outperforming text-based, vision-based, temporal, and conventional multimodal baselines. In early-stage detection scenarios utilizing only the first 30% of video content, the framework maintains stable performance advantages, achieving a precision of 0.812, recall of 0.704, and F1 score of 0.754, validating its capability for proactive fraud warning. Full article

(This article belongs to the Special Issue Artificial Intelligence-Driven Sensing)

► Show Figures

Figure 1

Search Results (246)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (246)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI