Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (790)

Search Parameters:
Keywords = multimodal data fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 3129 KB  
Review
A Review on Gas Pipeline Leak Detection: Acoustic-Based, OGI-Based, and Multimodal Fusion Methods
by Yankun Gong, Chao Bao, Zhengxi He, Yifan Jian, Xiaoye Wang, Haineng Huang and Xintai Song
Information 2025, 16(9), 731; https://doi.org/10.3390/info16090731 (registering DOI) - 25 Aug 2025
Abstract
Pipelines play a vital role in material transportation within industrial settings. This review synthesizes detection technologies for early-stage small gas leaks from pipelines in the industrial sector, with a focus on acoustic-based methods, optical gas imaging (OGI), and multimodal fusion approaches. It encompasses [...] Read more.
Pipelines play a vital role in material transportation within industrial settings. This review synthesizes detection technologies for early-stage small gas leaks from pipelines in the industrial sector, with a focus on acoustic-based methods, optical gas imaging (OGI), and multimodal fusion approaches. It encompasses detection principles, inherent challenges, mitigation strategies, and the state of the art (SOTA). Small leaks refer to low flow leakage originating from defects with apertures at millimeter or submillimeter scales, posing significant detection difficulties. Acoustic detection leverages the acoustic wave signals generated by gas leaks for non-contact monitoring, offering advantages such as rapid response and broad coverage. However, its susceptibility to environmental noise interference often triggers false alarms. This limitation can be mitigated through time-frequency analysis, multi-sensor fusion, and deep-learning algorithms—effectively enhancing leak signals, suppressing background noise, and thereby improving the system’s detection robustness and accuracy. OGI utilizes infrared imaging technology to visualize leakage gas and is applicable to the detection of various polar gases. Its primary limitations include low image resolution, low contrast, and interference from complex backgrounds. Mitigation techniques involve background subtraction, optical flow estimation, fully convolutional neural networks (FCNNs), and vision transformers (ViTs), which enhance image contrast and extract multi-scale features to boost detection precision. Multimodal fusion technology integrates data from diverse sensors, such as acoustic and optical devices. Key challenges lie in achieving spatiotemporal synchronization across multiple sensors and effectively fusing heterogeneous data streams. Current methodologies primarily utilize decision-level fusion and feature-level fusion techniques. Decision-level fusion offers high flexibility and ease of implementation but lacks inter-feature interaction; it is less effective than feature-level fusion when correlations exist between heterogeneous features. Feature-level fusion amalgamates data from different modalities during the feature extraction phase, generating a unified cross-modal representation that effectively resolves inter-modal heterogeneity. In conclusion, we posit that multimodal fusion holds significant potential for further enhancing detection accuracy beyond the capabilities of existing single-modality technologies and is poised to become a major focus of future research in this domain. Full article
Show Figures

Figure 1

27 pages, 3068 KB  
Article
EAR-CCPM-Net: A Cross-Modal Collaborative Perception Network for Early Accident Risk Prediction
by Wei Sun, Lili Nurliyana Abdullah, Fatimah Binti Khalid and Puteri Suhaiza Binti Sulaiman
Appl. Sci. 2025, 15(17), 9299; https://doi.org/10.3390/app15179299 - 24 Aug 2025
Abstract
Early traffic accident risk prediction in complex road environments poses significant challenges due to the heterogeneous nature and incomplete semantic alignment of multimodal data. To address this, we propose a novel Early Accident Risk Cross-modal Collaborative Perception Mechanism Network (EAR-CCPM-Net) that integrates hierarchical [...] Read more.
Early traffic accident risk prediction in complex road environments poses significant challenges due to the heterogeneous nature and incomplete semantic alignment of multimodal data. To address this, we propose a novel Early Accident Risk Cross-modal Collaborative Perception Mechanism Network (EAR-CCPM-Net) that integrates hierarchical fusion modules and cross-modal attention mechanisms to enable semantic interaction between visual, motion, and textual modalities. The model is trained and evaluated on the newly constructed CAP-DATA dataset, incorporating advanced preprocessing techniques such as bilateral filtering and a rigorous MINI-Train-Test sampling protocol. Experimental results show that EAR-CCPM-Net achieves an AUC of 0.853, AP of 0.758, and improves the Time-to-Accident (TTA0.5) from 3.927 s to 4.225 s, significantly outperforming baseline methods. These findings demonstrate that EAR-CCPM-Net effectively enhances early-stage semantic perception and prediction accuracy, providing an interpretable solution for real-world traffic risk anticipation. Full article
Show Figures

Figure 1

29 pages, 59556 KB  
Review
Application of Deep Learning Technology in Monitoring Plant Attribute Changes
by Shuwei Han and Haihua Wang
Sustainability 2025, 17(17), 7602; https://doi.org/10.3390/su17177602 - 22 Aug 2025
Viewed by 334
Abstract
With the advancement of remote sensing imagery and multimodal sensing technologies, monitoring plant trait dynamics has emerged as a critical area of research in modern agriculture. Traditional approaches, which rely on handcrafted features and shallow models, struggle to effectively address the complexity inherent [...] Read more.
With the advancement of remote sensing imagery and multimodal sensing technologies, monitoring plant trait dynamics has emerged as a critical area of research in modern agriculture. Traditional approaches, which rely on handcrafted features and shallow models, struggle to effectively address the complexity inherent in high-dimensional and multisource data. In contrast, deep learning, with its end-to-end feature extraction and nonlinear modeling capabilities, has substantially improved monitoring accuracy and automation. This review summarizes recent developments in the application of deep learning methods—including CNNs, RNNs, LSTMs, Transformers, GANs, and VAEs—to tasks such as growth monitoring, yield prediction, pest and disease identification, and phenotypic analysis. It further examines prominent research themes, including multimodal data fusion, transfer learning, and model interpretability. Additionally, it discusses key challenges related to data scarcity, model generalization, and real-world deployment. Finally, the review outlines prospective directions for future research, aiming to inform the integration of deep learning with phenomics and intelligent IoT systems and to advance plant monitoring toward greater intelligence and high-throughput capabilities. Full article
(This article belongs to the Section Sustainable Agriculture)
Show Figures

Figure 1

27 pages, 1970 KB  
Review
Artificial Intelligence in Alzheimer’s Disease Diagnosis and Prognosis Using PET-MRI: A Narrative Review of High-Impact Literature Post-Tauvid Approval
by Rafail C. Christodoulou, Amanda Woodward, Rafael Pitsillos, Reina Ibrahim and Michalis F. Georgiou
J. Clin. Med. 2025, 14(16), 5913; https://doi.org/10.3390/jcm14165913 - 21 Aug 2025
Viewed by 206
Abstract
Background: Artificial intelligence (AI) is reshaping neuroimaging workflows for Alzheimer’s disease (AD) diagnosis, particularly through PET and MRI analysis advances. Since the FDA approval of Tauvid, a PET tracer targeting tau pathology, there has been a notable increase in studies applying AI to [...] Read more.
Background: Artificial intelligence (AI) is reshaping neuroimaging workflows for Alzheimer’s disease (AD) diagnosis, particularly through PET and MRI analysis advances. Since the FDA approval of Tauvid, a PET tracer targeting tau pathology, there has been a notable increase in studies applying AI to neuroimaging data. This narrative review synthesizes recent, high-impact literature to highlight clinically relevant AI applications in AD imaging. Methods: This review examined peer-reviewed studies published between January 2020 and January 2025, focusing on the use of AI, including machine learning, deep learning, and hybrid models for diagnostic and prognostic tasks in AD using PET and/or MRI. Studies were identified through targeted PubMed, Scopus, and Embase searches, emphasizing methodological diversity and clinical relevance. Results: A total of 111 studies were categorized into five thematic areas: Image preprocessing and segmentation, diagnostic classification, prognosis and disease staging, multimodal data fusion, and emerging innovations. Deep learning models such as convolutional neural networks (CNNs), generative adversarial networks (GANs), and transformer-based architectures were widely employed by the research community in the field of AD. At the same time, several models reported strong diagnostic performance, but methodological challenges such as reproducibility, small sample sizes, and lack of external validation limit clinical translation. Trends in explainable AI, synthetic imaging, and integration of clinical biomarkers are also discussed. Conclusions: AI is rapidly advancing the field of AD imaging, offering tools for enhanced segmentation, staging, and early diagnosis. Multimodal approaches and biomarker-guided models show particular promise. However, future research must focus on reproducibility, interpretability, and standardized validation to bridge the gap between research and clinical practice. Full article
Show Figures

Figure 1

22 pages, 6265 KB  
Article
A Multi-Level Fusion Framework for Bearing Fault Diagnosis Using Multi-Source Information
by Xiaojun Deng, Yuanhao Sun, Lin Li and Xia Peng
Processes 2025, 13(8), 2657; https://doi.org/10.3390/pr13082657 - 21 Aug 2025
Viewed by 131
Abstract
Rotating machinery is essential to modern industrial systems, where rolling bearings play a critical role in ensuring mechanical stability and operational efficiency. Failures in bearings can result in serious safety risks and significant financial losses, which highlights the need for accurate and robust [...] Read more.
Rotating machinery is essential to modern industrial systems, where rolling bearings play a critical role in ensuring mechanical stability and operational efficiency. Failures in bearings can result in serious safety risks and significant financial losses, which highlights the need for accurate and robust methods for diagnosing bearing faults. Traditional diagnostic methods relying on single-source data often fail to fully leverage the rich information provided by multiple sensors and are more prone to performance degradation under noisy conditions. Therefore, this paper proposes a novel bearing fault diagnosis method based on a multi-level fusion framework. First, the Symmetrized Dot Pattern (SDP) method is applied to fuse multi-source signals into unified SDP images, enabling effective fusion at the data level. Then, a combination of RepLKNet and Bidirectional Gated Recurrent Unit (BiGRU) networks extracts multi-modal features, which are then fused through a cross-attention mechanism to enhance feature representation. Finally, information entropy is utilized to assess the reliability of each feature channel, enabling dynamic weighting to further strengthen model robustness. The experiments conducted on public datasets and noise-augmented datasets demonstrate that the proposed method significantly surpasses other single-source and multi-source data fusion models in terms of diagnostic accuracy and robustness to noise. Full article
(This article belongs to the Section Process Control and Monitoring)
Show Figures

Figure 1

45 pages, 2283 KB  
Review
Agricultural Image Processing: Challenges, Advances, and Future Trends
by Xuehua Song, Letian Yan, Sihan Liu, Tong Gao, Li Han, Xiaoming Jiang, Hua Jin and Yi Zhu
Appl. Sci. 2025, 15(16), 9206; https://doi.org/10.3390/app15169206 - 21 Aug 2025
Viewed by 131
Abstract
Agricultural image processing technology plays a critical role in enabling precise disease detection, accurate yield prediction, and various smart agriculture applications. However, its practical implementation faces key challenges, including environmental interference, data scarcity and imbalance datasets, and the difficulty of deploying models on [...] Read more.
Agricultural image processing technology plays a critical role in enabling precise disease detection, accurate yield prediction, and various smart agriculture applications. However, its practical implementation faces key challenges, including environmental interference, data scarcity and imbalance datasets, and the difficulty of deploying models on resource-constrained edge devices. This paper presents a systematic review of recent advances in addressing these challenges, with a focus on three core aspects: environmental robustness, data efficiency, and model deployment. The study identifies that attention mechanisms, Transformers, multi-scale feature fusion, and domain adaptation can enhance model robustness under complex conditions. Self-supervised learning, transfer learning, GAN-based data augmentation, SMOTE improvements, and Focal loss optimization effectively alleviate data limitations. Furthermore, model compression techniques such as pruning, quantization, and knowledge distillation facilitate efficient deployment. Future research should emphasize multi-modal fusion, causal reasoning, edge–cloud collaboration, and dedicated hardware acceleration. Integrating agricultural expertise with AI is essential for promoting large-scale adoption, as well as achieving intelligent, sustainable agricultural systems. Full article
(This article belongs to the Special Issue Pattern Recognition Applications of Neural Networks and Deep Learning)
Show Figures

Figure 1

18 pages, 1993 KB  
Article
Fault Line Selection in Distribution Networks Based on Dual-Channel Time-Frequency Fusion Network
by Yuyi Ma, Wei Guo, Yuntao Shi, Jianing Guan, Yushuai Qi, Xiang Yin and Gang Liu
Mathematics 2025, 13(16), 2687; https://doi.org/10.3390/math13162687 - 21 Aug 2025
Viewed by 191
Abstract
In distribution networks, single-phase ground faults often lead to abnormal changes in voltage and current signals. Traditional single-modal fault diagnosis methods usually struggle to accurately identify the fault line under such conditions. To address this issue, this paper proposes a fault line identification [...] Read more.
In distribution networks, single-phase ground faults often lead to abnormal changes in voltage and current signals. Traditional single-modal fault diagnosis methods usually struggle to accurately identify the fault line under such conditions. To address this issue, this paper proposes a fault line identification method based on a multimodal feature fusion model. The approach combines time-frequency images—generated using a Short-Time Fourier Transform (STFT) and Wigner–Ville Distribution (WVD) fusion algorithm with one-dimensional time-series signals for classification. The time-frequency images visualize both temporal and spectral features of the signal and are processed using the RepLKNet model for deep feature extraction. Meanwhile, the raw one-dimensional time-series signals preserve the original temporal dependencies and are analyzed using a BiGRU network enhanced with a global attention mechanism to improve feature representation. Finally, features from both modalities are extracted in parallel and fused to achieve accurate fault line identification. Experimental results demonstrate that the proposed method effectively leverages the complementary nature of multimodal data and shows strong robustness in the presence of noise interference. Full article
Show Figures

Figure 1

20 pages, 2496 KB  
Article
Mine-DW-Fusion: BEV Multiscale-Enhanced Fusion Object-Detection Model for Underground Coal Mine Based on Dynamic Weight Adjustment
by Wanzi Yan, Yidong Zhang, Minti Xue, Zhencai Zhu, Hao Lu, Xin Zhang, Wei Tang and Keke Xing
Sensors 2025, 25(16), 5185; https://doi.org/10.3390/s25165185 - 20 Aug 2025
Viewed by 373
Abstract
Environmental perception is crucial for achieving autonomous driving of auxiliary haulage vehicles in underground coal mines. The complex underground environment and working conditions, such as dust pollution, uneven lighting, and sensor data abnormalities, pose challenges to multimodal fusion perception. These challenges include: (1) [...] Read more.
Environmental perception is crucial for achieving autonomous driving of auxiliary haulage vehicles in underground coal mines. The complex underground environment and working conditions, such as dust pollution, uneven lighting, and sensor data abnormalities, pose challenges to multimodal fusion perception. These challenges include: (1) the lack of a reasonable and effective method for evaluating the reliability of different modality data; (2) the absence of in-depth fusion methods for different modality data that can handle sensor failures; and (3) the lack of a multimodal dataset for underground coal mines to support model training. To address these issues, this paper proposes a coal mine underground BEV multiscale-enhanced fusion perception model based on dynamic weight adjustment. First, camera and LiDAR modality data are uniformly mapped into BEV space to achieve multimodal feature alignment. Then, a Mixture of Experts-Fuzzy Logic Inference Module (MoE-FLIM) is designed to infer weights for different modality data based on BEV feature dimensions. Next, a Pyramid Multiscale Feature Enhancement and Fusion Module (PMS-FFEM) is introduced to ensure the model’s perception performance in the event of sensor data abnormalities. Lastly, a multimodal dataset for underground coal mines is constructed to provide support for model training and testing in real-world scenarios. Experimental results show that the proposed method demonstrates good accuracy and stability in object-detection tasks in coal mine underground environments, maintaining high detection performance, especially in typical complex scenes such as low light and dust fog. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

21 pages, 3474 KB  
Article
DFF: Sequential Dual-Branch Feature Fusion for Maritime Radar Object Detection and Tracking via Video Processing
by Donghui Li, Yu Xia, Fei Cheng, Cheng Ji, Jielu Yan, Weizhi Xian, Xuekai Wei, Mingliang Zhou and Yi Qin
Appl. Sci. 2025, 15(16), 9179; https://doi.org/10.3390/app15169179 - 20 Aug 2025
Viewed by 154
Abstract
Robust maritime radar object detection and tracking in maritime clutter environments is critical for maritime safety and security. Conventional Constant False Alarm Rate (CFAR) detectors have limited performance in processing complex-valued radar echoes, especially in complex scenarios where phase information is critical and [...] Read more.
Robust maritime radar object detection and tracking in maritime clutter environments is critical for maritime safety and security. Conventional Constant False Alarm Rate (CFAR) detectors have limited performance in processing complex-valued radar echoes, especially in complex scenarios where phase information is critical and in the real-time processing of successive echo pulses, while existing deep learning methods usually lack native support for complex-valued data and have inherent shortcomings in real-time compared to conventional methods. To overcome these limitations, we propose a dual-branch sequence feature fusion (DFF) detector designed specifically for complex-valued continuous sea-clutter signals, drawing on commonly used methods in video pattern recognition. The DFF employs dual parallel complex-valued U-Net branches to extract multilevel spatiotemporal features from distance profiles and Doppler features from distance–Doppler spectrograms, preserving the critical phase–amplitude relationship. Subsequently, the sequential feature-extraction module (SFEM) captures the temporal dependence in both feature streams. Next, the Adaptive Weight Learning (AWL) module dynamically fuses these multimodal features by learning modality-specific weights. Finally, the detection module generates the object localisation output. Extensive evaluations on the IPIX and SDRDSP datasets show that DFF performs well. On SDRDSP, DFF achieves 98.76% accuracy and 68.75% in F1 score, which significantly outperforms traditional CFAR methods and state-of-the-art deep learning models in terms of detection accuracy and false alarm rate (FAR). These results validate the effectiveness of DFF for reliable maritime object detection in complex clutter environments through multimodal feature fusion and sequence-dependent modelling. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 3307 KB  
Article
Electrode-Free ECG Monitoring with Multimodal Wireless Mechano-Acoustic Sensors
by Zhi Li, Fei Fei and Guanglie Zhang
Biosensors 2025, 15(8), 550; https://doi.org/10.3390/bios15080550 - 20 Aug 2025
Viewed by 131
Abstract
Continuous cardiovascular monitoring is essential for the early detection of cardiac events, but conventional electrode-based ECG systems cause skin irritation and are unsuitable for long-term wear. We propose an electrode-free ECG monitoring approach that leverages synchronized phonocardiogram (PCG) and seismocardiogram (SCG) signals captured [...] Read more.
Continuous cardiovascular monitoring is essential for the early detection of cardiac events, but conventional electrode-based ECG systems cause skin irritation and are unsuitable for long-term wear. We propose an electrode-free ECG monitoring approach that leverages synchronized phonocardiogram (PCG) and seismocardiogram (SCG) signals captured by wireless mechano-acoustic sensors. PCG provides precise valvular event timings, while SCG provides mechanical context, enabling the robust identification of systolic/diastolic intervals and pathological patterns. A deep learning model reconstructs ECG waveforms by intelligently combining mechano-acoustic sensor data. Its architecture leverages specialized neural network components to identify and correlate key cardiac signatures from multimodal inputs. Experimental validation on an IoT sensor dataset yields a mean Pearson correlation of 0.96 and an RMSE of 0.49 mV compared to clinical ECGs. By eliminating skin-contact electrodes through PCG–SCG fusion, this system enables robust IoT-compatible daily-life cardiac monitoring. Full article
Show Figures

Figure 1

29 pages, 7018 KB  
Article
Real-Time Efficiency Prediction in Nonlinear Fractional-Order Systems via Multimodal Fusion
by Biao Ma and Shimin Dong
Fractal Fract. 2025, 9(8), 545; https://doi.org/10.3390/fractalfract9080545 - 19 Aug 2025
Viewed by 233
Abstract
Rod pump systems are complex nonlinear processes, and conventional efficiency prediction methods for such systems typically rely on high-order fractional partial differential equations, which severely constrain real-time inference. Motivated by the increasing availability of measured electrical power data, this paper introduces a series [...] Read more.
Rod pump systems are complex nonlinear processes, and conventional efficiency prediction methods for such systems typically rely on high-order fractional partial differential equations, which severely constrain real-time inference. Motivated by the increasing availability of measured electrical power data, this paper introduces a series of prediction models for nonlinear fractional-order PDE systems efficiency based on multimodal feature fusion. First, three single-model predictions—Asymptotic Cross-Fusion, Adaptive-Weight Late-Fusion, and Two-Stage Progressive Feature Fusion—are presented; next, two ensemble approaches—one based on a Parallel-Cascaded Ensemble strategy and the other on Data Envelopment Analysis—are developed; finally, by balancing base-learner diversity with predictive accuracy, a multi-strategy ensemble prediction model is devised for online rod pump system efficiency estimation. Comprehensive experiments and ablation studies on data from 3938 oil wells demonstrate that the proposed methods deliver high predictive accuracy while meeting real-time performance requirements. Full article
(This article belongs to the Special Issue Artificial Intelligence and Fractional Modelling for Energy Systems)
Show Figures

Figure 1

23 pages, 391 KB  
Review
A Survey on Vehicle Trajectory Prediction Procedures for Intelligent Driving
by Tingjing Wang, Daiquan Xiao, Xuecai Xu and Quan Yuan
Sensors 2025, 25(16), 5129; https://doi.org/10.3390/s25165129 - 19 Aug 2025
Viewed by 477
Abstract
Aimed at vehicle trajectory prediction procedures, this survey provides a comprehensive review for intelligent driving from both theoretical and practical perspectives. Vehicle trajectory prediction procedures are explained in terms of the perception layer, core technology of trajectory prediction, decision-making layer, and scenario application. [...] Read more.
Aimed at vehicle trajectory prediction procedures, this survey provides a comprehensive review for intelligent driving from both theoretical and practical perspectives. Vehicle trajectory prediction procedures are explained in terms of the perception layer, core technology of trajectory prediction, decision-making layer, and scenario application. In the perception layer, various sensors, visual-based perception devices, and multimodal fusion perception devices are enumerated. Additionally, the visual-based multimodal perception and pure visual perception techniques employed in the top five intelligent vehicles in China are introduced. Regarding the core technology of trajectory prediction, the methods are categorized into short-term domain and long-term domains, in which the former includes physics-based and machine learning algorithms, whereas the latter involves deep-learning and driving intention-related algorithms. Identically, the core technologies adopted in the top five intelligent vehicles are summarized. As for the decision-making layer, three main categories are summarized theoretically and practically, decision-making and planning for cooperation, super-computing and closed-loop, and real-time and optimization. As for the scenario application, open scenarios and closed scenarios are discussed in theory and practice. Finally, the research outlook on vehicle trajectory prediction is presented from data collection, trajectory prediction methods, generalization and transferability, and real-world application. The results provide some potential insights for researchers and practitioners in the vehicle trajectory prediction field, and guides future advancements in this field. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

26 pages, 36602 KB  
Article
FE-MCFN: Fuzzy-Enhanced Multi-Scale Cross-Modal Fusion Network for Hyperspectral and LiDAR Joint Data Classification
by Shuting Wei, Mian Jia and Junyi Duan
Algorithms 2025, 18(8), 524; https://doi.org/10.3390/a18080524 - 18 Aug 2025
Viewed by 346
Abstract
With the rapid advancement of remote sensing technologies, the joint classification of hyperspectral image (HSI) and LiDAR data has become a key research focus in the field. To address the impact of inherent uncertainties in hyperspectral images on classification—such as the “same spectrum, [...] Read more.
With the rapid advancement of remote sensing technologies, the joint classification of hyperspectral image (HSI) and LiDAR data has become a key research focus in the field. To address the impact of inherent uncertainties in hyperspectral images on classification—such as the “same spectrum, different materials” and “same material, different spectra” phenomena, as well as the complexity of spectral features. Furthermore, existing multimodal fusion approaches often fail to fully leverage the complementary advantages of hyperspectral and LiDAR data. We propose a fuzzy-enhanced multi-scale cross-modal fusion network (FE-MCFN) designed to achieve joint classification of hyperspectral and LiDAR data. The FE-MCFN enhances convolutional neural networks through the application of fuzzy theory and effectively integrates global contextual information via a cross-modal attention mechanism. The fuzzy learning module utilizes a Gaussian membership function to assign weights to features, thereby adeptly capturing uncertainties and subtle distinctions within the data. To maximize the complementary advantages of multimodal data, a fuzzy fusion module is designed, which is grounded in fuzzy rules and integrates multimodal features across various scales while taking into account both local features and global information, ultimately enhancing the model’s classification performance. Experimental results obtained from the Houston2013, Trento, and MUUFL datasets demonstrate that the proposed method outperforms current state-of-the-art classification techniques, thereby validating its effectiveness and applicability across diverse scenarios. Full article
(This article belongs to the Section Databases and Data Structures)
Show Figures

Figure 1

22 pages, 5692 KB  
Article
RiceStageSeg: A Multimodal Benchmark Dataset for Semantic Segmentation of Rice Growth Stages
by Jianping Zhang, Tailai Chen, Yizhe Li, Qi Meng, Yanying Chen, Jie Deng and Enhong Sun
Remote Sens. 2025, 17(16), 2858; https://doi.org/10.3390/rs17162858 - 16 Aug 2025
Viewed by 396
Abstract
The accurate identification of rice growth stages is critical for precision agriculture, crop management, and yield estimation. Remote sensing technologies, particularly multimodal approaches that integrate high spatial and hyperspectral resolution imagery, have demonstrated great potential in large-scale crop monitoring. Multimodal data fusion offers [...] Read more.
The accurate identification of rice growth stages is critical for precision agriculture, crop management, and yield estimation. Remote sensing technologies, particularly multimodal approaches that integrate high spatial and hyperspectral resolution imagery, have demonstrated great potential in large-scale crop monitoring. Multimodal data fusion offers complementary and enriched spectral–spatial information, providing novel pathways for crop growth stage recognition in complex agricultural scenarios. However, the lack of publicly available multimodal datasets specifically designed for rice growth stage identification remains a significant bottleneck that limits the development and evaluation of relevant methods. To address this gap, we present RiceStageSeg, a multimodal benchmark dataset captured by unmanned aerial vehicles (UAVs), designed to support the development and assessment of segmentation models for rice growth monitoring. RiceStageSeg contains paired centimeter-level RGB and 10-band multispectral (MS) images acquired during several critical rice growth stages, including jointing and heading. Each image is accompanied by fine-grained, pixel-level annotations that distinguish between the different growth stages. We establish baseline experiments using several state-of-the-art semantic segmentation models under both unimodal (RGB-only, MS-only) and multimodal (RGB + MS fusion) settings. The experimental results demonstrate that multimodal feature-level fusion outperforms unimodal approaches in segmentation accuracy. RiceStageSeg offers a standardized benchmark to advance future research in multimodal semantic segmentation for agricultural remote sensing. The dataset will be made publicly available on GitHub v0.11.0 (accessed on 1 August 2025). Full article
Show Figures

Figure 1

46 pages, 12839 KB  
Article
Tree Type Classification from ALS Data: A Comparative Analysis of 1D, 2D, and 3D Representations Using ML and DL Models
by Sead Mustafić, Mathias Schardt and Roland Perko
Remote Sens. 2025, 17(16), 2847; https://doi.org/10.3390/rs17162847 - 15 Aug 2025
Viewed by 406
Abstract
Accurate classification of individual tree types is a key component in forest inventory, biodiversity monitoring, and ecological modeling. This study evaluates and compares multiple Machine Learning (ML) and Deep Learning (DL) approaches for tree type classification based on Airborne Laser Scanning (ALS) data. [...] Read more.
Accurate classification of individual tree types is a key component in forest inventory, biodiversity monitoring, and ecological modeling. This study evaluates and compares multiple Machine Learning (ML) and Deep Learning (DL) approaches for tree type classification based on Airborne Laser Scanning (ALS) data. A mixed-species forest in southeastern Austria, Europe, served as the test site, with spruce, pine, and a grouped class of broadleaf species as target categories. To examine the impact of data representation, ALS point clouds were transformed into four distinct structures: 1D feature vectors, 2D raster profiles, 3D voxel grids, and unstructured 3D point clouds. A comprehensive dataset, combining field measurements and manually annotated aerial data, was used to train and validate 45 ML and DL models. Results show that DL models based on 3D point clouds achieved the highest overall accuracy (up to 88.1%), followed by multi-view 2D raster and voxel-based methods. Traditional ML models performed well on 1D data but struggled with high-dimensional inputs. Spruce trees were classified most reliably, while confusion between pine and broadleaf species remained challenging across methods. The study highlights the importance of selecting suitable data structures and model types for operational tree classification and outlines potential directions for improving accuracy through multimodal and temporal data fusion. Full article
(This article belongs to the Section Forest Remote Sensing)
Show Figures

Figure 1

Back to TopTop