Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (145)

Search Parameters:
Keywords = fine-grained alignment

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 662 KB  
Article
Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization
by Dawon Lee, Hyuckchul Jung and Yong Suk Choi
Appl. Sci. 2025, 15(19), 10506; https://doi.org/10.3390/app151910506 - 28 Sep 2025
Abstract
Recent studies on detecting hallucinations in summaries follow a method of decomposing summaries into atomic content units (ACUs) and then determining whether each unit logically matches the document text based on natural language inference. However, this fails to consider discourse link relations such [...] Read more.
Recent studies on detecting hallucinations in summaries follow a method of decomposing summaries into atomic content units (ACUs) and then determining whether each unit logically matches the document text based on natural language inference. However, this fails to consider discourse link relations such as temporal order, causality, and purpose, leading to the inability to detect conflicts in semantic connections between individual summary ACUs, even when the conflicts are present in the document. To overcome this limitation, this study proposes a method of extracting Discourse Link-Aware Content Unit (DL-ACU) by converting the summary into an Abstract Meaning Representation (AMR) graph and structuring the discourse link relations between ACUs. Additionally, to align summary ACUs with corresponding document information in a fine-grained manner, we propose a Selective Document-Atomic Content Unit (SD-ACU). For each summary ACU, the SD-ACU retrieves only the most relevant document sentences and then decomposes them into document ACUs. Applying the DL-ACU module to existing hallucination detection systems such as FIZZ and FENICE reduces the error rate of discourse link errors on FRANK. When both modules are combined, the system improves balanced accuracy and ROC-AUC across major benchmarks. This suggests the proposed method effectively captures discourse link errors while enabling ACU-to-ACU alignment. Full article
Show Figures

Figure 1

21 pages, 3747 KB  
Article
Open-Vocabulary Crack Object Detection Through Attribute-Guided Similarity Probing
by Hyemin Yoon and Sangjin Kim
Appl. Sci. 2025, 15(19), 10350; https://doi.org/10.3390/app151910350 - 24 Sep 2025
Viewed by 152
Abstract
Timely detection of road surface defects such as cracks and potholes is critical for ensuring traffic safety and reducing infrastructure maintenance costs. While recent advances in image-based deep learning techniques have shown promise for automated road defect detection, existing models remain limited to [...] Read more.
Timely detection of road surface defects such as cracks and potholes is critical for ensuring traffic safety and reducing infrastructure maintenance costs. While recent advances in image-based deep learning techniques have shown promise for automated road defect detection, existing models remain limited to closed-set detection settings, making it difficult to recognize newly emerging or fine-grained defect types. To address this limitation, we propose an attribute-aware open-vocabulary crack detection (AOVCD) framework, which leverages the alignment capability of pretrained vision–language models to generalize beyond fixed class labels. In this framework, crack types are represented as combinations of visual attributes, enabling semantic grounding between image regions and natural language descriptions. To support this, we extend the existing PPDD dataset with attribute-level annotations and incorporate a multi-label attribute recognition task as an auxiliary objective. Experimental results demonstrate that the proposed AOVCD model outperforms existing baselines. In particular, compared to CLIP-based zero-shot inference, the proposed model achieves approximately a 10-fold improvement in average precision (AP) for novel crack categories. Attribute classification performance—covering geometric, spatial, and textural features—also increases by 40% in balanced accuracy (BACC) and 23% in AP. These results indicate that integrating structured attribute information enhances generalization to previously unseen defect types, especially those involving subtle visual cues. Our study suggests that incorporating attribute-level alignment within a vision–language framework can lead to more adaptive and semantically grounded defect recognition systems. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

26 pages, 1333 KB  
Article
Category Name Expansion and an Enhanced Multimodal Fusion Framework for Few-Shot Learning
by Tianlei Gao, Lei Lyu, Xiaoyun Xie, Nuo Wei, Yushui Geng and Minglei Shu
Entropy 2025, 27(9), 991; https://doi.org/10.3390/e27090991 - 22 Sep 2025
Viewed by 202
Abstract
With the advancement of image processing techniques, few-shot learning (FSL) has gradually become a key approach to addressing the problem of data scarcity. However, existing FSL methods often rely on unimodal information under limited sample conditions, making it difficult to capture fine-grained differences [...] Read more.
With the advancement of image processing techniques, few-shot learning (FSL) has gradually become a key approach to addressing the problem of data scarcity. However, existing FSL methods often rely on unimodal information under limited sample conditions, making it difficult to capture fine-grained differences between categories. To address this issue, we propose a multimodal few-shot learning method based on category name expansion and image feature enhancement. By integrating the expanded category text with image features, the proposed method enriches the semantic representation of categories and enhances the model’s sensitivity to detailed features. To further improve the quality of cross-modal information transfer, we introduce a cross-modal residual connection strategy that aligns features across layers through progressive fusion. This approach enables the fused representations to maximize mutual information while reducing redundancy, effectively alleviating the information bottleneck caused by uneven entropy distribution between modalities and enhancing the model’s generalization ability. Experimental results demonstrate that our method achieves superior performance on both natural image datasets (CIFAR-FS and FC100) and a medical image dataset. Full article
Show Figures

Figure 1

24 pages, 2338 KB  
Article
DynaNet: A Dynamic Feature Extraction and Multi-Path Attention Fusion Network for Change Detection
by Xue Li, Dong Li, Jiandong Fang and Xueying Feng
Sensors 2025, 25(18), 5832; https://doi.org/10.3390/s25185832 - 18 Sep 2025
Viewed by 349
Abstract
Existing change detection methods often struggle with both inadequate feature fusion and interference from background noise when processing bi-temporal remote sensing imagery. These challenges are particularly pronounced in building change detection, where capturing subtle spatial and semantic dependencies is critical. To address these [...] Read more.
Existing change detection methods often struggle with both inadequate feature fusion and interference from background noise when processing bi-temporal remote sensing imagery. These challenges are particularly pronounced in building change detection, where capturing subtle spatial and semantic dependencies is critical. To address these issues, we propose DynaNet, a dynamic feature extraction and multi-path attention fusion network for change detection. Specifically, we design a Dynamic Feature Extractor (DFE) that leverages a cross-temporal gating mechanism to amplify relevant change signals while suppressing irrelevant variations, enabling high-quality feature alignment. A Contextual Attention Module (CAM) is then employed to incorporate global contextual information, further enhancing the discriminative capability of change regions. Additionally, a Multi-Branch Attention Fusion Module (MBAFM) is introduced to model inter-scale semantic relationships through self- and cross-attention mechanisms, thereby improving the detection of fine-grained structural changes. To facilitate robust evaluation, we present a new benchmark dataset, Inner-CD, comprising 800 pairs of 256 × 256 bi-temporal satellite images with 0.5–2 m spatial resolution. Unlike existing datasets, Inner-CD features abundant buildings in both temporal images, with changes manifested as subtle morphological variations. Extensive experiments demonstrate that DynaNet achieves state-of-the-art performance, obtaining F1-scores of 90.92% on Inner-CD, 92.38% on LEVIR-CD, and 94.35% on WHU-CD. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

16 pages, 881 KB  
Article
Text-Guided Spatio-Temporal 2D and 3D Data Fusion for Multi-Object Tracking with RegionCLIP
by Youlin Liu, Zainal Rasyid Mahayuddin and Mohammad Faidzul Nasrudin
Appl. Sci. 2025, 15(18), 10112; https://doi.org/10.3390/app151810112 - 16 Sep 2025
Viewed by 406
Abstract
3D Multi-Object Tracking (3D MOT) is a critical task in autonomous systems, where accurate and robust tracking of multiple objects in dynamic environments is essential. Traditional approaches primarily rely on visual or geometric features, often neglecting the rich semantic information available in textual [...] Read more.
3D Multi-Object Tracking (3D MOT) is a critical task in autonomous systems, where accurate and robust tracking of multiple objects in dynamic environments is essential. Traditional approaches primarily rely on visual or geometric features, often neglecting the rich semantic information available in textual modalities. In this paper, we propose Text-Guided 3D Multi-Object Tracking (TG3MOT), a novel framework that incorporates Vision-Language Models (VLMs) into the YONTD architecture to improve 3D MOT performance. Our framework leverages RegionCLIP, a multimodal open-vocabulary detector, to achieve fine-grained alignment between image regions and textual concepts, enabling the incorporation of semantic information into the tracking process. To address challenges such as occlusion, blurring, and ambiguous object appearances, we introduce the Target Semantic Matching Module (TSM), which quantifies the uncertainty of semantic alignment and filters out unreliable regions. Additionally, we propose the 3D Feature Exponential Moving Average Module (3D F-EMA) to incorporate temporal information, improving robustness in noisy or occluded scenarios. Furthermore, the Gaussian Confidence Fusion Module (GCF) is introduced to weight historical trajectory confidences based on temporal proximity, enhancing the accuracy of trajectory management. We evaluate our framework on the KITTI dataset and compare it with the YONTD baseline. Extensive experiments demonstrate that although the overall HOTA gain of TG3MOT is modest (+0.64%), our method achieves substantial improvements in association accuracy (+0.83%) and significantly reduces ID switches (−16.7%). These improvements are particularly valuable in real-world autonomous driving scenarios, where maintaining consistent trajectories under occlusion and ambiguous appearances is crucial for downstream tasks such as trajectory prediction and motion planning. The code will be made publicly available. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

15 pages, 1093 KB  
Article
A Multimodal Power Sample Feature Migration Method Based on Dual Cross-Modal Information Decoupling
by Zhenyu Chen, Huaguang Yan, Jianguang Du, Yuhao Zhou, Yi Chen, Yunfeng Yan and Shuai Zhao
Appl. Sci. 2025, 15(18), 9913; https://doi.org/10.3390/app15189913 - 10 Sep 2025
Viewed by 272
Abstract
With the rapid development of energy transition and power system informatization, the efficient integration and feature migration of multimodal power data have become critical challenges for intelligent power systems. Existing methods often overlook fine-grained semantic relationships in cross-modal alignment, leading to low information [...] Read more.
With the rapid development of energy transition and power system informatization, the efficient integration and feature migration of multimodal power data have become critical challenges for intelligent power systems. Existing methods often overlook fine-grained semantic relationships in cross-modal alignment, leading to low information utilization. This paper proposes a multimodal power sample feature migration method based on dual cross-modal information decoupling. By introducing a fine-grained image–text alignment strategy and a dual-stream attention mechanism, deep integration and efficient migration of multimodal features are achieved. Experiments demonstrate that the proposed method outperforms baseline models (e.g., LLaVA, Qwen) in power scenario description (CSD), event localization (CELC), and knowledge question answering (CKQ), with significant improvements of up to 12.8% in key metrics such as image captioning (IC) and grounded captioning (GC). The method provides a robust solution for multimodal feature migration in power inspection and real-time monitoring, showing high practical value in industrial applications. Full article
Show Figures

Figure 1

19 pages, 2435 KB  
Article
Image Sensor-Supported Multimodal Attention Modeling for Educational Intelligence
by Yanlin Chen, Yingqiu Yang, Zeyu Lan, Xinyuan Chen, Haoyuan Zhan, Lingxi Yu and Yan Zhan
Sensors 2025, 25(18), 5640; https://doi.org/10.3390/s25185640 - 10 Sep 2025
Viewed by 314
Abstract
To address the limitations of low fusion efficiency and insufficient personalization in multimodal perception for educational intelligence, a novel deep learning framework is proposed that integrates image sensor data with textual and contextual information through a cross-modal attention mechanism. The architecture employs a [...] Read more.
To address the limitations of low fusion efficiency and insufficient personalization in multimodal perception for educational intelligence, a novel deep learning framework is proposed that integrates image sensor data with textual and contextual information through a cross-modal attention mechanism. The architecture employs a cross-modal alignment module to achieve fine-grained semantic correspondence between visual features captured by image sensors and associated textual elements, followed by a personalized feedback generator that incorporates learner background and task context embeddings to produce adaptive educational guidance. A cognitive weakness highlighter is introduced to enhance the discriminability of task-relevant features, enabling explicit localization and interpretation of conceptual gaps. Experiments show the proposed method outperforms conventional fusion and unimodal baselines with 92.37% accuracy, 91.28% recall, and 90.84% precision. Cross-task and noise-robustness tests confirm its stability, while ablation studies highlight the fusion module’s +4.2% accuracy gain and the attention mechanism’s +3.8% recall and +3.5% precision improvements. These results establish the proposed method as a transferable, high-performance solution for next-generation adaptive learning systems, offering precise, explainable, and context-aware feedback grounded in advanced multimodal perception modeling. Full article
Show Figures

Figure 1

21 pages, 6632 KB  
Article
Delineating Functional Metropolitan Areas in China: A Method Based on the Tri-Dimensional PET Coupling Model
by Jiawei Zheng, Yaping Huang, Shiwei Lu, Yueheng Huang and Leizhou Zhu
Land 2025, 14(9), 1789; https://doi.org/10.3390/land14091789 - 2 Sep 2025
Viewed by 535
Abstract
Metropolitan areas have become the primary spatial form for China’s new-era urbanization. However, these boundaries have traditionally been delineated based on administrative factors, resulting in a notable discrepancy with the actual functional connections. To tackle this challenge, this study aims to devise and [...] Read more.
Metropolitan areas have become the primary spatial form for China’s new-era urbanization. However, these boundaries have traditionally been delineated based on administrative factors, resulting in a notable discrepancy with the actual functional connections. To tackle this challenge, this study aims to devise and implement an innovative ‘PET’ tri-dimensional coupling model, leveraging the principles of integrated urban subsystems to scientifically delineate functional metropolitan boundaries. The proposed method integrates Population flow (P), Economic density (E), and Transportation accessibility (T) on a fine-grained 1 km raster grid. To enhance accuracy, the crucial population flow component is simulated using a gravity model calibrated with real-world Baidu Migration data. Applying this model to 35 potential metropolitan areas, our findings reveal two key points. First, a comparative analysis with five officially approved plans reveals a significant spatial alignment in core functional zones, which corroborates the model’s accuracy. effectiveness. Secondly, these delineations clearly quantify the notable difference between the ‘functional space’ influenced by socioeconomic factors and the ‘administrative space’ delineated by jurisdictional boundaries. In summary, this research presents a replicable methodology for delineating functional metropolitan areas. It offers vital technical support and policy guidance for optimizing regional planning, enhancing inter-city coordination, and promoting China’s national strategy for regional development. Full article
Show Figures

Figure 1

21 pages, 1863 KB  
Article
Enhancing Phytoplankton Recognition Through a Hybrid Dataset and Morphological Description-Driven Prompt Learning
by Yubo Huo, Qingxuan Lv and Junyu Dong
J. Mar. Sci. Eng. 2025, 13(9), 1680; https://doi.org/10.3390/jmse13091680 - 1 Sep 2025
Viewed by 559
Abstract
Phytoplankton plays a pivotal role in marine ecosystems and global biogeochemical cycles. Accurate identification and monitoring of phytoplankton are essential for understanding environmental dynamics and climate variations. Despite the significant progress made in automatic phytoplankton identification, current datasets predominantly consist of idealized laboratory [...] Read more.
Phytoplankton plays a pivotal role in marine ecosystems and global biogeochemical cycles. Accurate identification and monitoring of phytoplankton are essential for understanding environmental dynamics and climate variations. Despite the significant progress made in automatic phytoplankton identification, current datasets predominantly consist of idealized laboratory images, leading to models that demonstrate persistent limitations in the fine-grained differentiation of phytoplankton species. To achieve high accuracy and transferability for morphologically similar species and diverse ecosystems, we introduce a hybrid dataset by integrating laboratory-based observations with in situ marine environmental data. We evaluate the performance of our dataset on contemporary deep learning models, revealing that CNN-based architectures offer superior stability (85.27% mAcc., 93.76% oAcc.). Multimodal learning facilitates refined phytoplankton recognition through the integration of visual and textual representations, thereby enhancing the model’s semantic comprehension capabilities. We present a fine-tuned visual language model leveraging enhanced textual prompts augmented with expert-annotated morphological descriptions, significantly enhancing visual-semantic alignment and allowing for more accurate and interpretable recognition of closely related species (84.11% mAcc., 94.48% oAcc.). Our research establishes a benchmark dataset that facilitates real-time ecological monitoring and aquatic biodiversity research. Furthermore, it also contributes to the field by enhancing model robustness and transferability to diverse environmental contexts and taxonomically similar species. Full article
(This article belongs to the Section Marine Biology)
Show Figures

Figure 1

30 pages, 578 KB  
Article
Two-Stage Mining of Linkage Risk for Data Release
by Runshan Hu, Yuanguo Lin, Mu Yang, Yuanhui Yu and Vladimiro Sassone
Mathematics 2025, 13(17), 2731; https://doi.org/10.3390/math13172731 - 25 Aug 2025
Viewed by 511
Abstract
Privacy risk mining, a crucial domain in data privacy protection, endeavors to uncover potential information among datasets that could be linked to individuals’ sensitive data. Existing anonymization and privacy assessment techniques either lack quantitative granularity or fail to adapt to dynamic, heterogeneous data [...] Read more.
Privacy risk mining, a crucial domain in data privacy protection, endeavors to uncover potential information among datasets that could be linked to individuals’ sensitive data. Existing anonymization and privacy assessment techniques either lack quantitative granularity or fail to adapt to dynamic, heterogeneous data environments. In this work, we propose a unified two-phase linkability quantification framework that systematically measures privacy risks at both the inter-dataset and intra-dataset levels. Our approach integrates unsupervised clustering on attribute distributions with record-level matching to compute interpretable, fine-grained risk scores. By aligning risk measurement with regulatory standards such as the GDPR, our framework provides a practical, scalable solution for safeguarding user privacy in evolving data-sharing ecosystems. Extensive experiments on real-world and synthetic datasets show that our method achieves up to 96.7% precision in identifying true linkage risks, outperforming the compared baseline by 13 percentage points under identical experimental settings. Ablation studies further demonstrate that the hierarchical risk fusion strategy improves sensitivity to latent vulnerabilities, providing more actionable insights than previous privacy gain-based metrics. Full article
Show Figures

Figure 1

39 pages, 4783 KB  
Article
Sparse-MoE-SAM: A Lightweight Framework Integrating MoE and SAM with a Sparse Attention Mechanism for Plant Disease Segmentation in Resource-Constrained Environments
by Benhan Zhao, Xilin Kang, Hao Zhou, Ziyang Shi, Lin Li, Guoxiong Zhou, Fangying Wan, Jiangzhang Zhu, Yongming Yan, Leheng Li and Yulong Wu
Plants 2025, 14(17), 2634; https://doi.org/10.3390/plants14172634 - 24 Aug 2025
Viewed by 657
Abstract
Plant disease segmentation has achieved significant progress with the help of artificial intelligence. However, deploying high-accuracy segmentation models in resource-limited settings faces three key challenges, as follows: (A) Traditional dense attention mechanisms incur quadratic computational complexity growth (O(n2d)), rendering [...] Read more.
Plant disease segmentation has achieved significant progress with the help of artificial intelligence. However, deploying high-accuracy segmentation models in resource-limited settings faces three key challenges, as follows: (A) Traditional dense attention mechanisms incur quadratic computational complexity growth (O(n2d)), rendering them ill-suited for low-power hardware. (B) Naturally sparse spatial distributions and large-scale variations in the lesions on leaves necessitate models that concurrently capture long-range dependencies and local details. (C) Complex backgrounds and variable lighting in field images often induce segmentation errors. To address these challenges, we propose Sparse-MoE-SAM, an efficient framework based on an enhanced Segment Anything Model (SAM). This deep learning framework integrates sparse attention mechanisms with a two-stage mixture of experts (MoE) decoder. The sparse attention dynamically activates key channels aligned with lesion sparsity patterns, reducing self-attention complexity while preserving long-range context. Stage 1 of the MoE decoder performs coarse-grained boundary localization; Stage 2 achieves fine-grained segmentation by leveraging specialized experts within the MoE, significantly enhancing edge discrimination accuracy. The expert repository—comprising standard convolutions, dilated convolutions, and depthwise separable convolutions—dynamically routes features through optimized processing paths based on input texture and lesion morphology. This enables robust segmentation across diverse leaf textures and plant developmental stages. Further, we design a sparse attention-enhanced Atrous Spatial Pyramid Pooling (ASPP) module to capture multi-scale contexts for both extensive lesions and small spots. Evaluations on three heterogeneous datasets (PlantVillage Extended, CVPPP, and our self-collected field images) show that Sparse-MoE-SAM achieves a mean Intersection-over-Union (mIoU) of 94.2%—surpassing standard SAM by 2.5 percentage points—while reducing computational costs by 23.7% compared to the original SAM baseline. The model also demonstrates balanced performance across disease classes and enhanced hardware compatibility. Our work validates that integrating sparse attention with MoE mechanisms sustains accuracy while drastically lowering computational demands, enabling the scalable deployment of plant disease segmentation models on mobile and edge devices. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)
Show Figures

Figure 1

38 pages, 4467 KB  
Article
Causal Decoupling for Temporal Knowledge Graph Reasoning via Contrastive Learning and Adaptive Fusion
by Siling Feng, Housheng Lu, Qian Liu, Peng Xu, Yujie Zheng, Bolin Chen and Mengxing Huang
Information 2025, 16(9), 717; https://doi.org/10.3390/info16090717 - 22 Aug 2025
Viewed by 729
Abstract
Temporal knowledge graphs (TKGs) are crucial for modeling evolving real-world facts and are widely applied in event forecasting and risk analysis. However, current TKG reasoning models struggle to separate causal signals from noisy observations, align temporal dynamics with semantic structures, and integrate long-term [...] Read more.
Temporal knowledge graphs (TKGs) are crucial for modeling evolving real-world facts and are widely applied in event forecasting and risk analysis. However, current TKG reasoning models struggle to separate causal signals from noisy observations, align temporal dynamics with semantic structures, and integrate long-term and short-term knowledge effectively. To address these challenges, we propose the Temporal Causal Contrast Graph Network (TCCGN), a unified framework that disentangles causal features from noise via orthogonal decomposition and adversarial learning; applies dual-domain contrastive learning to enhance both temporal and semantic consistency; and introduces a gated fusion module for adaptive integration of static and dynamic features across time scales. Extensive experiments on five benchmarks (ICEWS14/05-15/18, YAGO, GDELT) show that TCCGN consistently outperforms prior models. On ICEWS14, it achieves 42.46% MRR and 31.63% Hits@1, surpassing RE-GCN by 1.21 points. On the high-noise GDELT dataset, it improves MRR by 1.0%. These results highlight TCCGN’s robustness and its promise for real-world temporal reasoning tasks involving fine-grained causal inference under noisy conditions. Full article
Show Figures

Figure 1

17 pages, 3907 KB  
Article
Motion Intention Prediction for Lumbar Exoskeletons Based on Attention-Enhanced sEMG Inference
by Mingming Wang, Linsen Xu, Zhihuan Wang, Qi Zhu and Tao Wu
Biomimetics 2025, 10(9), 556; https://doi.org/10.3390/biomimetics10090556 - 22 Aug 2025
Viewed by 569
Abstract
Exoskeleton robots function as augmentation systems that establish mechanical couplings with the human body, substantially enhancing the wearer’s biomechanical capabilities through assistive torques. We introduce a lumbar spine-assisted exoskeleton design based on Variable-Stiffness Pneumatic Artificial Muscles (VSPAM) and develop a dynamic adaptation mechanism [...] Read more.
Exoskeleton robots function as augmentation systems that establish mechanical couplings with the human body, substantially enhancing the wearer’s biomechanical capabilities through assistive torques. We introduce a lumbar spine-assisted exoskeleton design based on Variable-Stiffness Pneumatic Artificial Muscles (VSPAM) and develop a dynamic adaptation mechanism bridging the pneumatic drive module with human kinematic intent to facilitate human–robot cooperative control. For kinematic intent resolution, we propose a multimodal fusion architecture integrating the VGG16 convolutional network with Long Short-Term Memory (LSTM) networks. By incorporating self-attention mechanisms, we construct a fine-grained relational inference module that leverages multi-head attention weight matrices to capture global spatio-temporal feature dependencies, overcoming local feature constraints inherent in traditional algorithms. We further employ cross-attention mechanisms to achieve deep fusion of visual and kinematic features, establishing aligned intermodal correspondence to mitigate unimodal perception limitations. Experimental validation demonstrates 96.1% ± 1.2% motion classification accuracy, offering a novel technical solution for rehabilitation robotics and industrial assistance. Full article
(This article belongs to the Special Issue Advanced Service Robots: Exoskeleton Robots 2025)
Show Figures

Figure 1

21 pages, 8789 KB  
Article
Integrating Image Recognition, Sentiment Analysis, and UWB Tracking for Urban Heritage Tourism: A Multimodal Case Study in Macau
by Deng Ai, Da Kuang, Yiqi Tao and Fanbo Zeng
Sustainability 2025, 17(17), 7573; https://doi.org/10.3390/su17177573 - 22 Aug 2025
Viewed by 776
Abstract
Amid growing demands for heritage conservation and precision urban governance, this study proposes a multimodal framework to analyze tourist perception and behavior in Macau’s Historic Centre. We integrate geotagged social media images and text, ultra-wideband (UWB) pedestrian trajectories, and a LiDAR-derived 3D digital [...] Read more.
Amid growing demands for heritage conservation and precision urban governance, this study proposes a multimodal framework to analyze tourist perception and behavior in Macau’s Historic Centre. We integrate geotagged social media images and text, ultra-wideband (UWB) pedestrian trajectories, and a LiDAR-derived 3D digital twin to examine the interplay among spatial configuration, movement, and affect. Visual content in tourist photos is classified with You Only Look Once (YOLOv8), and sentiment polarity in Weibo posts is estimated with a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model. UWB data provide fine-grained trajectories, and all modalities are georeferenced within the digital twin. Results indicate that iconic landmarks concentrate visual attention, pedestrian density, and positive sentiment, whereas peripheral sites show lower footfall yet strong emotional resonance. We further identify three coupling typologies that differentiate tourist experiences across spatial contexts. The study advances multimodal research on historic urban centers by delivering a reproducible framework that aligns image, text, and trajectory data to extract microscale patterns. Theoretically, it elucidates how spatial configuration, movement intensity, and affective expression co-produce experiential quality. Using Macau’s Historic Centre as an empirical testbed, the findings inform heritage revitalization, wayfinding, and crowd-management strategies. Full article
Show Figures

Figure 1

15 pages, 2065 KB  
Article
Potential Use of Brewer’s Spent Grain By-Product as a Component for Sustainable Thermal Mortars
by Maria Manso, Joaquim Silva, Vítor Antunes, Isabel Ivo, João Canto and Cristina Guerra
Sustainability 2025, 17(16), 7557; https://doi.org/10.3390/su17167557 - 21 Aug 2025
Viewed by 618
Abstract
Buildings represent approximately 40% of the total energy consumption. Net-zero energy buildings (NZEBs) have lower energy demands than conventional buildings due to improved thermal insulation combined with other passive design strategies. Thermal mortars, used in insulating plasters, help improve buildings’ energy efficiency in [...] Read more.
Buildings represent approximately 40% of the total energy consumption. Net-zero energy buildings (NZEBs) have lower energy demands than conventional buildings due to improved thermal insulation combined with other passive design strategies. Thermal mortars, used in insulating plasters, help improve buildings’ energy efficiency in a cost-effective manner, with minimal added thickness, even on irregular surfaces. Brewer’s spent grain (BSG) accounts for 85% of the total by-products of the brewing industry. It is a cellulosic wood material, with a composition rich in protein (20%) and fiber (70%). Considering these properties, it has potential for use as a natural aggregate in mortars and as a sustainable material for buildings aligned with circular economy principles. This work aims to characterize BSG as a natural by-product for use in thermal mortars and identify different incorporation percentages. First, BSG was characterized in terms of its water content, particle size and volume mass. Then, mortars with BSG and fine sand, with different water contents, were produced and compared to a reference mortar and two commercially available thermal mortars. The performance of the mixtures was evaluated in terms of water absorption, mechanical behavior (namely, compressive and flexural strength) and thermal behavior. BSG mortars with a 0.25 w/c ratio presented a water absorption coefficient similar to that of the reference mortar. Overall, BSG mortars presented a mechanical strength profile similar to that of conventional thermal mortars. In the thermal test, the best BSG mortar (BSG75-w/c-0.25) achieved a stationary temperature difference between surfaces that was 8% lower than that of a commercial thermal mortar and 110% higher than that of the reference mortar. In sum, the best BSG mortars had a lower w/c ratio. Full article
Show Figures

Figure 1

Back to TopTop