Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (248)

Search Parameters:
Keywords = instance perception

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
36 pages, 1204 KB  
Article
A UAV-Based Framework for Visual Detection and Geospatial Mapping of Real Road Surface Defects
by Paula López, Pablo Zubasti, Jesús García and Jose M. Molina
Drones 2026, 10(2), 119; https://doi.org/10.3390/drones10020119 (registering DOI) - 7 Feb 2026
Abstract
Accurate detection of road surface defects and their integration into geospatial representations are key requirements for scalable UAV-based inspection and maintenance systems.This work presents a lightweight processing pipeline that converts image-based pavement defect segmentations into compact geospatial vector representations suitable for integration with [...] Read more.
Accurate detection of road surface defects and their integration into geospatial representations are key requirements for scalable UAV-based inspection and maintenance systems.This work presents a lightweight processing pipeline that converts image-based pavement defect segmentations into compact geospatial vector representations suitable for integration with GIS-driven inspection workflows. In addition, we introduce and publicly release a UAV-based road defect dataset with pixel-level annotations, specifically designed for crack-like pavement damage. A deep convolutional neural network is trained to perform semantic segmentation of pavement defects using images derived from the publicly available RDD2022 dataset. Segmentation performance is evaluated across a range of probability thresholds using standard pixel-wise metrics, and a validation-selected operating point is used to generate binary defect masks. These masks are subsequently processed to identify individual defect instances and extract vector polygons that preserve the underlying geometry of crack-like structures. For illustrative geospatial integration, predicted defects are projected into geographic coordinates and exported in standard GIS formats. By transforming dense segmentation outputs into compact georeferenced polygons, the proposed framework bridges deep learning-based perception and GIS-based infrastructure assessment, enabling instance-level geometric analysis and providing a practical representation for UAV-based road inspection scenarios. Full article
27 pages, 23394 KB  
Article
YOLO-MSRF: A Multimodal Segmentation and Refinement Framework for Tomato Fruit Detection and Segmentation with Count and Size Estimation Under Complex Illumination
by Ao Li, Chunrui Wang, Aichen Wang, Jianpeng Sun, Fengwei Gu and Tianxue Zhang
Agriculture 2026, 16(2), 277; https://doi.org/10.3390/agriculture16020277 - 22 Jan 2026
Viewed by 146
Abstract
Segmentation of tomato fruits under complex lighting conditions remains technically challenging, especially in low illumination or overexposure, where RGB-only methods often suffer from blurred boundaries and missed small or occluded instances, and simple multimodal fusion cannot fully exploit complementary cues. To address these [...] Read more.
Segmentation of tomato fruits under complex lighting conditions remains technically challenging, especially in low illumination or overexposure, where RGB-only methods often suffer from blurred boundaries and missed small or occluded instances, and simple multimodal fusion cannot fully exploit complementary cues. To address these gaps, we propose YOLO-MSRF, a lightweight RGB–NIR multimodal segmentation and refinement framework for robust tomato perception in facility agriculture. Firstly, we propose a dual-branch multimodal backbone, introduce Cross-Modality Difference Complement Fusion (C-MDCF) for difference-based complementary RGB–NIR fusion, and design C2f-DCB to reduce computation while strengthening feature extraction. Furthermore, we develop a cross-scale attention fusion network and introduce the proposed MS-CPAM to jointly model multi-scale channel and position cues, strengthening fine-grained detail representation and spatial context aggregation for small and occluded tomatoes. Finally, we design the Multi-Scale Fusion and Semantic Refinement Network, MSF-SRNet, which combines the Scale-Concatenate Fusion Module (Scale-Concat) fusion with SDI-based cross-layer detail injection to progressively align and refine multi-scale features, improving representation quality and segmentation accuracy. Extensive experiments show that YOLO-MSRF achieves substantial gains under weak and low-light conditions, where RGB-only models are most prone to boundary degradation and missed instances, and it still delivers consistent improvements on the mixed four-light validation set, increasing mAP0.5 by 2.3 points, mAP0.50.95 by 2.4 points, and mIoU by 3.60 points while maintaining real-time inference at 105.07 FPS. The proposed system further supports counting, size estimation, and maturity analysis of harvestable tomatoes, and can be integrated with depth sensing and yield estimation to enable real-time yield prediction in practical greenhouse operations. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

21 pages, 15860 KB  
Article
Robot Object Detection and Tracking Based on Image–Point Cloud Instance Matching
by Hongxing Wang, Rui Zhu, Zelin Ye and Yaxin Li
Sensors 2026, 26(2), 718; https://doi.org/10.3390/s26020718 - 21 Jan 2026
Viewed by 236
Abstract
Effectively fusing the rich semantic information from camera images with the high-precision geometric measurements provided by LiDAR point clouds is a key challenge in mobile robot environmental perception. To address this problem, this paper proposes a highly extensible instance-aware fusion framework designed to [...] Read more.
Effectively fusing the rich semantic information from camera images with the high-precision geometric measurements provided by LiDAR point clouds is a key challenge in mobile robot environmental perception. To address this problem, this paper proposes a highly extensible instance-aware fusion framework designed to achieve efficient alignment and unified modeling of heterogeneous sensory data. The proposed approach adopts a modular processing pipeline. First, semantic instance masks are extracted from RGB images using an instance segmentation network, and a projection mechanism is employed to establish spatial correspondences between image pixels and LiDAR point cloud measurements. Subsequently, three-dimensional bounding boxes are reconstructed through point cloud clustering and geometric fitting, and a reprojection-based validation mechanism is introduced to ensure consistency across modalities. Building upon this representation, the system integrates a data association module with a Kalman filter-based state estimator to form a closed-loop multi-object tracking framework. Experimental results on the KITTI dataset demonstrate that the proposed system achieves strong 2D and 3D detection performance across different difficulty levels. In multi-object tracking evaluation, the method attains a MOTA score of 47.8 and an IDF1 score of 71.93, validating the stability of the association strategy and the continuity of object trajectories in complex scenes. Furthermore, real-world experiments on a mobile computing platform show an average end-to-end latency of only 173.9 ms, while ablation studies further confirm the effectiveness of individual system components. Overall, the proposed framework exhibits strong performance in terms of geometric reconstruction accuracy and tracking robustness, and its lightweight design and low latency satisfy the stringent requirements of practical robotic deployment. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

18 pages, 14158 KB  
Article
Vision-Based Perception and Execution Decision-Making for Fruit Picking Robots Using Generative AI Models
by Yunhe Zhou, Chunjiang Yu, Jiaming Zhang, Yuanhang Liu, Jiangming Kan, Xiangjun Zou, Kang Zhang, Hanyan Liang, Sheng Zhang and Fengyun Wu
Machines 2026, 14(1), 117; https://doi.org/10.3390/machines14010117 - 19 Jan 2026
Viewed by 206
Abstract
At present, fruit picking mainly relies on manual operation. Taking the litchi (litchi chinensis Sonn.)-picking robot as an example, visual perception is often affected by illumination variations, low recognition accuracy, complex maturity judgment, and occlusion, which lead to inaccurate fruit localization. This study [...] Read more.
At present, fruit picking mainly relies on manual operation. Taking the litchi (litchi chinensis Sonn.)-picking robot as an example, visual perception is often affected by illumination variations, low recognition accuracy, complex maturity judgment, and occlusion, which lead to inaccurate fruit localization. This study aims to establish an embodied perception mechanism based on “perception-reasoning-execution” to enhance the visual perception and decision-making capability of the robot in complex orchard environments. First, a Y-LitchiC instance segmentation method is proposed to achieve high-precision segmentation of litchi clusters. Second, a generative artificial intelligence model is introduced to intelligently assess fruit maturity and occlusion, providing auxiliary support for automatic picking. Based on the auxiliary judgments provided by the generative AI model, two types of dynamic harvesting decisions are formulated for subsequent operations. For unoccluded main fruit-bearing branches, a skeleton thinning algorithm is applied within the segmented region to extract the skeleton line, and the midpoint of the skeleton is used to perform the first type of localization and harvesting decision. In contrast, for main fruit-bearing branches occluded by leaves, threshold-based segmentation combined with maximum connected component extraction is employed to obtain the target region, followed by skeleton thinning, thereby completing the second type of dynamic picking decision. Experimental results show that the Y-LitchiC model improves the mean average precision (mAP) by 1.6% compared with the YOLOv11s-seg model, achieving higher accuracy in litchi cluster segmentation and recognition. The generative artificial intelligence model provides higher-level reasoning and decision-making capabilities for automatic picking. Overall, the proposed embodied perception mechanism and dynamic picking strategies effectively enhance the autonomous perception and decision-making of the picking robot in complex orchard environments, providing a reliable theoretical basis and technical support for accurate fruit localization and precision picking. Full article
(This article belongs to the Special Issue Control Engineering and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 83627 KB  
Article
Research on Urban Perception of Zhengzhou City Based on Interpretable Machine Learning
by Mengjing Zhang, Chen Pan, Xiaohua Huang, Lujia Zhang and Mengshun Lee
Buildings 2026, 16(2), 314; https://doi.org/10.3390/buildings16020314 - 11 Jan 2026
Viewed by 229
Abstract
Urban perception research has long focused on global metropolises, but has overlooked many cities with complex functions and spatial structures, resulting in insufficient universality of existing theories when facing diverse urban contexts. This study constructed an analytical framework that integrates street scene images [...] Read more.
Urban perception research has long focused on global metropolises, but has overlooked many cities with complex functions and spatial structures, resulting in insufficient universality of existing theories when facing diverse urban contexts. This study constructed an analytical framework that integrates street scene images and interpretable machine learning. Taking Zhengzhou City as the research object, it extracted street visual elements based on deep learning technology and systematically analyzed the formation mechanism of multi-dimensional urban perception by combining the LightGBM model and SHAP method. The main findings of the research are as follows: (1) The urban perception of Zhengzhou City shows a significant east–west difference with Zhongzhou Avenue as the boundary. Positive perceptions such as safety and vitality are concentrated in the central business district and historical districts, while negative perceptions are more common in the urban fringe areas with chaotic built environments and single functions. (2) The visibility of greenery, the openness of the sky and the continuity of the building interface are identified as key visual elements affecting perception, and their directions and intensifies of action show significant differences due to different perception dimensions. (3) The influence of visual elements on perception has a complex mechanism of action. For instance, the promoting effect of greenery visibility on beauty perception tends to level off after reaching a certain threshold. The research results of this study can provide quantitative basis and strategic reference for the improvement in urban space quality and humanized street design. Full article
Show Figures

Figure 1

31 pages, 6416 KB  
Article
FireMM-IR: An Infrared-Enhanced Multi-Modal Large Language Model for Comprehensive Scene Understanding in Remote Sensing Forest Fire Monitoring
by Jinghao Cao, Xiajun Liu and Rui Xue
Sensors 2026, 26(2), 390; https://doi.org/10.3390/s26020390 - 7 Jan 2026
Viewed by 364
Abstract
Forest fire monitoring in remote sensing imagery has long relied on traditional perception models that primarily focus on detection or segmentation. However, such approaches fall short in understanding complex fire dynamics, including contextual reasoning, fire evolution description, and cross-modal interpretation. With the rise [...] Read more.
Forest fire monitoring in remote sensing imagery has long relied on traditional perception models that primarily focus on detection or segmentation. However, such approaches fall short in understanding complex fire dynamics, including contextual reasoning, fire evolution description, and cross-modal interpretation. With the rise of multi-modal large language models (MLLMs), it becomes possible to move beyond low-level perception toward holistic scene understanding that jointly reasons about semantics, spatial distribution, and descriptive language. To address this gap, we introduce FireMM-IR, a multi-modal large language model tailored for pixel-level scene understanding in remote-sensing forest-fire imagery. FireMM-IR incorporates an infrared-enhanced classification module that fuses infrared and visual modalities, enabling the model to capture fire intensity and hidden ignition areas under dense smoke. Furthermore, we design a mask-generation module guided by language-conditioned segmentation tokens to produce accurate instance masks from natural-language queries. To effectively learn multi-scale fire features, a class-aware memory mechanism is introduced to maintain contextual consistency across diverse fire scenes. We also construct FireMM-Instruct, a unified corpus of 83,000 geometrically aligned RGB–IR pairs with instruction-aligned descriptions, bounding boxes, and pixel-level annotations. Extensive experiments show that FireMM-IR achieves superior performance on pixel-level segmentation and strong results on instruction-driven captioning and reasoning, while maintaining competitive performance on image-level benchmarks. These results indicate that infrared–optical fusion and instruction-aligned learning are key to physically grounded understanding of wildfire scenes. Full article
(This article belongs to the Special Issue Remote Sensing and UAV Technologies for Environmental Monitoring)
Show Figures

Figure 1

16 pages, 1970 KB  
Article
LSON-IP: Lightweight Sparse Occupancy Network for Instance Perception
by Xinwang Zheng, Yuhang Cai, Lu Yang, Chengyu Lu and Guangsong Yang
World Electr. Veh. J. 2026, 17(1), 31; https://doi.org/10.3390/wevj17010031 - 7 Jan 2026
Viewed by 264
Abstract
The high computational demand of dense voxel representations severely limits current vision-centric 3D semantic occupancy prediction methods, despite their capacity for granular scene understanding. This challenge is particularly acute in safety-critical applications like autonomous driving, where accurately perceiving dynamic instances often takes precedence [...] Read more.
The high computational demand of dense voxel representations severely limits current vision-centric 3D semantic occupancy prediction methods, despite their capacity for granular scene understanding. This challenge is particularly acute in safety-critical applications like autonomous driving, where accurately perceiving dynamic instances often takes precedence over capturing the static background. This paper challenges the paradigm of dense prediction for such instance-focused tasks. We introduce the LSON-IP, a framework that strategically avoids the computational expense of dense 3D grids. LSON-IP operates on a sparse set of 3D instance queries, which are initialized directly from multi-view 2D images. These queries are then refined by our novel Sparse Instance Aggregator (SIA), an attention-based module. The SIA incorporates rich multi-view features while simultaneously modeling inter-query relationships to construct coherent object representations. Furthermore, to obviate the need for costly 3D annotations, we pioneer a Differentiable Sparse Rendering (DSR) technique. DSR innovatively defines a continuous field from the sparse voxel output, establishing a differentiable bridge between our sparse 3D representation and 2D supervision signals through volume rendering. Extensive experiments on major autonomous driving benchmarks, including SemanticKITTI and nuScenes, validate our approach. LSON-IP achieves strong performance on key dynamic instance categories and competitive overall semantic completion, all while reducing computational overhead by over 60% compared to dense baselines. Our work thus paves the way for efficient, high-fidelity instance-aware 3D perception. Full article
Show Figures

Figure 1

24 pages, 578 KB  
Article
The Evolution of Spanish Ver ‘to See’ in Constructions with a Predicate Participle or Adjective
by Chantal Melis, María Isabel Jiménez Martínez and Milagros Alfonso Vega
Languages 2026, 11(1), 13; https://doi.org/10.3390/languages11010013 - 31 Dec 2025
Viewed by 433
Abstract
The focus in this corpus-based study is on a set of Spanish constructions formed with the verb of visual perception, ver ‘to See’, and a predicate adjective or participle. In addition to a clearly recognizable transitive schema, the set includes various instances featuring [...] Read more.
The focus in this corpus-based study is on a set of Spanish constructions formed with the verb of visual perception, ver ‘to See’, and a predicate adjective or participle. In addition to a clearly recognizable transitive schema, the set includes various instances featuring a reflexive clitic pronoun coreferential with the subject, some of which have been argued to evidence the grammaticalization of lexical ver into a univerbated semicopular verb (pronominal verse), meaning little more than ‘be’ in some examples, and proximate to the intransitive sense of English look in other cases. We trace the evolution of these constructions in data spanning the history of the Spanish language, from its recorded beginnings to the present. We establish the need to distinguish two constructional sources of change, namely, an old middle-reflexive and a younger reflexive passive. We draw attention to the “renewal” of the Latin deponent videri ‘appear, look, seem’, which can be said to have taken place in Spanish as a product of the passive-derived process of grammaticalization undergone by ver. And throughout the paper we address problems of analyzability, attributable to the superficially identical strings of words that characterize the constructional patterns with a reflexive morpheme. Full article
Show Figures

Figure 1

14 pages, 2571 KB  
Article
RMP: Robust Multi-Modal Perception Under Missing Condition
by Xin Ma, Xuqi Cai, Yuansheng Song, Yu Liang, Gang Liu and Yijun Yang
Electronics 2026, 15(1), 119; https://doi.org/10.3390/electronics15010119 - 26 Dec 2025
Viewed by 280
Abstract
Multi-modal perception is a core technology for edge devices to achieve safe and reliable environmental understanding in autonomous driving scenarios. In recent years, most approaches have focused on integrating complementary signals from diverse sensors, including cameras and LiDAR, to improve scene understanding in [...] Read more.
Multi-modal perception is a core technology for edge devices to achieve safe and reliable environmental understanding in autonomous driving scenarios. In recent years, most approaches have focused on integrating complementary signals from diverse sensors, including cameras and LiDAR, to improve scene understanding in complex traffic environments, thereby attracting significant attention. However, in real-world applications, sensor failures frequently occur; for instance, cameras may malfunction in scenarios with poor illumination, which severely reduces the accuracy of perception models. To overcome this issue, we propose a robust multi-modal perception pipeline designed to improve model performance under missing modality conditions. Specifically, we design a missing feature reconstruction mechanism to reconstruct absent features by leveraging intra-modal common clues. Furthermore, we introduce a multi-modal adaptive fusion strategy to facilitate adaptive multi-modal integration through inter-modal feature interactions. Extensive experiments on the nuScenes benchmark demonstrate that our method achieves SOTA-level performance under missing-modality conditions. Full article
(This article belongs to the Special Issue Hardware and Software Co-Design in Intelligent Systems)
Show Figures

Figure 1

22 pages, 31566 KB  
Article
PodFormer: An Adaptive Transformer-Based Framework for Instance Segmentation of Mature Soybean Pods in Field Environments
by Lei Cai and Xuewu Shou
Electronics 2026, 15(1), 80; https://doi.org/10.3390/electronics15010080 - 24 Dec 2025
Viewed by 216
Abstract
Mature soybean pods exhibit high homogeneity in color and texture relative to straw and dead leaves, and instances are often densely occluded, posing significant challenges for accurate field segmentation. To address these challenges, this paper constructs a high-quality field-based mature soybean dataset and [...] Read more.
Mature soybean pods exhibit high homogeneity in color and texture relative to straw and dead leaves, and instances are often densely occluded, posing significant challenges for accurate field segmentation. To address these challenges, this paper constructs a high-quality field-based mature soybean dataset and proposes an adaptive Transformer-based network, PodFormer, to improve segmentation performance under homogeneous backgrounds, dense distributions, and severe occlusions. PodFormer integrates three core innovations: (1) the Adaptive Wavelet Detail Enhancement (AWDE) module, which strengthens high-frequency boundary cues to alleviate weak-boundary ambiguities; (2) the Density-Guided Query Initialization (DGQI) module, which injects scale and density priors to enhance instance detection in both sparse and densely clustered regions; and (3) the Mask Feedback Gated Refinement (MFGR) layer, which leverages mask confidence to adaptively refine query updates, enabling more accurate separation of adhered or occluded instances. Experimental results show that PodFormer achieves relative improvements of 6.7% and 5.4% in mAP50 and mAP50-95, substantially outperforming state-of-the-art methods. It further demonstrates strong generalization capabilities on real-world field datasets and cross-domain wheat-ear datasets, thereby providing a reliable perception foundation for structural trait recognition in intelligent soybean harvesting systems. Full article
Show Figures

Figure 1

23 pages, 5771 KB  
Article
F3M: A Frequency-Domain Feature Fusion Module for Robust Underwater Object Detection
by Tianyi Wang, Haifeng Wang, Wenbin Wang, Kun Zhang, Baojiang Ye and Huilin Dong
J. Mar. Sci. Eng. 2026, 14(1), 20; https://doi.org/10.3390/jmse14010020 - 22 Dec 2025
Cited by 1 | Viewed by 398
Abstract
In this study, we propose the Frequency-domain Feature Fusion Module (F3M) to address the challenges of underwater object detection, where optical degradation—particularly high-frequency attenuation and low-frequency color distortion—significantly compromises performance. We critically re-evaluate the need for strict invertibility in detection-oriented frequency modeling. Traditional [...] Read more.
In this study, we propose the Frequency-domain Feature Fusion Module (F3M) to address the challenges of underwater object detection, where optical degradation—particularly high-frequency attenuation and low-frequency color distortion—significantly compromises performance. We critically re-evaluate the need for strict invertibility in detection-oriented frequency modeling. Traditional wavelet-based methods incur high computational redundancy to maintain signal reconstruction, whereas F3M introduces a lightweight “Separate–Project–Fuse” paradigm. This mechanism decouples low-frequency illumination artifacts from high-frequency structural cues via spatial approximation, enabling the recovery of fine-scale details like coral textures and debris boundaries without the overhead of channel expansion. We validate F3M’s versatility by integrating it into both Convolutional Neural Networks (YOLO) and Transformer-based detectors (RT-DETR). Evaluations on the SCoralDet dataset show consistent improvements: F3M enhances the lightweight YOLO11n by 3.5% mAP50 and increases RT-DETR-n’s localization accuracy (mAP50–95) from 0.514 to 0.532. Additionally, cross-domain validation on the deep-sea TrashCan-Instance dataset shows F3M achieving comparable accuracy to the larger YOLOv8n while requiring 13% fewer parameters and 20% fewer GFLOPs. This study confirms that frequency-domain modulation provides an efficient and widely applicable enhancement for real-time underwater perception. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

20 pages, 8786 KB  
Article
Learning to Count Crowds from Low-Altitude Aerial Views via Point-Level Supervision and Feature-Adaptive Fusion
by Junzhe Mao, Lin Nai, Jinqi Bai, Chang Liu and Liangfeng Xu
Appl. Sci. 2025, 15(24), 13211; https://doi.org/10.3390/app152413211 - 17 Dec 2025
Viewed by 390
Abstract
Counting small, densely clustered objects from low-altitude aerial views is challenging due to large scale variations, complex backgrounds, and severe occlusion, which often degrade the performance of fully supervised or density-regression methods. To address these issues, we propose a weakly supervised crowd counting [...] Read more.
Counting small, densely clustered objects from low-altitude aerial views is challenging due to large scale variations, complex backgrounds, and severe occlusion, which often degrade the performance of fully supervised or density-regression methods. To address these issues, we propose a weakly supervised crowd counting framework that leverages point-level supervision and a feature-adaptive fusion strategy to enhance perception under low-altitude aerial views. The network comprises a front-end feature extractor and a back-end fusion module. The front-end adopts the first 13 convolutional layers of VGG16-BN to capture multi-scale semantic features while preserving crucial spatial details. The back-end integrates a Feature-Adaptive Fusion module and a Multi-Scale Feature Aggregation module: the former dynamically adjusts fusion weights across scales to improve robustness to scale variation, and the latter aggregates multi-scale representations to better capture targets in dense, complex scenes. Point-level annotations serve as weak supervision to substantially reduce labeling cost while enabling accurate localization of small individual instances. Experiments on several public datasets, including ShanghaiTech Part A, ShanghaiTech Part B, and UCF_CC_50, demonstrate that our method surpasses existing mainstream approaches, effectively mitigating scale variation, background clutter, and occlusion, and providing an efficient and scalable weakly supervised solution for small-object counting. Full article
Show Figures

Figure 1

28 pages, 5016 KB  
Article
A Lightweight Improved YOLOv8-Based Method for Rebar Intersection Detection
by Rui Wang, Fangjun Shi, Yini She, Li Zhang, Kaifeng Lin, Longshun Fu and Jingkun Shi
Appl. Sci. 2025, 15(24), 12898; https://doi.org/10.3390/app152412898 - 7 Dec 2025
Viewed by 505
Abstract
As industrialized construction and smart building continue to advance, rebar-tying robots place higher demands on the real-time and accurate recognition of rebar intersections and their tying status. Existing deep learning-based detection methods generally rely on heavy backbone networks and complex feature-fusion structures, making [...] Read more.
As industrialized construction and smart building continue to advance, rebar-tying robots place higher demands on the real-time and accurate recognition of rebar intersections and their tying status. Existing deep learning-based detection methods generally rely on heavy backbone networks and complex feature-fusion structures, making it difficult to deploy them efficiently on resource-constrained mobile robots and edge devices, and there is also a lack of dedicated datasets for rebar intersections. In this study, 12,000 rebar mesh images were collected and annotated from two indoor scenes and one outdoor scene to construct a rebar-intersection dataset that supports both object detection and instance segmentation, enabling simultaneous learning of intersection locations and tying status. On this basis, a lightweight improved YOLOv8-based method for rebar intersection detection and segmentation is proposed. The original backbone is replaced with ShuffleNetV2, and a C2f_Dual residual module is introduced in the neck; the same improvements are further transferred to YOLOv8-seg to form a unified lightweight detection–segmentation framework for joint prediction of intersection locations and tying status. Experimental results show that, compared with the original YOLOv8L and several mainstream detectors, the proposed model achieves comparable or superior performance in terms of mAP@50, precision and recall, while reducing model size and computational cost by 51.2% and 58.1%, respectively, and significantly improving inference speed. The improved YOLOv8-seg also achieves satisfactory contour alignment and regional consistency for rebar regions and intersection masks. Owing to its combination of high accuracy and low resource consumption, the proposed method is well suited for deployment on edge-computing devices used in rebar-tying robots and construction quality inspection, providing an effective visual perception solution for intelligent construction. Full article
(This article belongs to the Special Issue Advances in Smart Construction and Intelligent Buildings)
Show Figures

Figure 1

32 pages, 2250 KB  
Article
Divergent Role of AI in Social Development: A Comparative Study of Teachers’ and Students’ Perceptions in Online and Physical Classrooms
by Qianye Wen, Jianliang Wang, Zhuoqi Guo and Daniel Badulescu
Behav. Sci. 2025, 15(12), 1649; https://doi.org/10.3390/bs15121649 - 30 Nov 2025
Viewed by 841
Abstract
This study addresses a critical gap in understanding Artificial Intelligence (AI)’s role in education by empirically investigating and comparing the distinct perceptions of teachers and students regarding AI’s role in a comprehensive range of social development aspects in both online and physical classroom [...] Read more.
This study addresses a critical gap in understanding Artificial Intelligence (AI)’s role in education by empirically investigating and comparing the distinct perceptions of teachers and students regarding AI’s role in a comprehensive range of social development aspects in both online and physical classroom settings. In particular, we evaluated how teachers utilize AI in their teaching methods, namely, Communicative Language Teaching (CLT), the Direct Method (DL), Task-Based Language Teaching (TBLT), Content and Language Integrated Learning (CLIL), and Community Language Learning (CLL), and students in their learning methods, namely, Communicative Learning (CL), Immersive Learning (IL), Task-Based Collaborative Learning (TBCL), Content Integrated Learning (CIL), and Community-Based Reflective Learning (CBRL), to configure their social development. We interviewed 20 teachers (10 from online and 10 from physical classes) and 40 students (20 from online and 20 from physical classes) and evaluated their perceptions regarding AI usage in teaching and learning methods towards social development. The results of our study are convincing enough to suggest that both teachers and students perceive AI usage helpful in teaching models; however, variation in their perception is observed. Notably, the divergence in the perception of teachers and students with regard to AI’s role is a key observation of this study. For instance, the teachers perceived AI as a highly effective tool in fostering community building during online sessions; in contrast, the students viewed its role as being moderately effective. Likewise, the teachers perceived AI’s role as a critical tool in traditional classrooms rather than in virtual ones, whereas the students associated AI with online learning—in terms of digital tools, learning opportunities, and critical discussion—by rating its impact on social confidence and verbal–nonverbal communications significantly more strongly in physical settings. On the contrary, the teachers emphasized AI’s relevance to their self-confidence, emotional intelligence, and community engagement in online teaching platforms; yet, the ratings dropped to moderate in physical contexts. The students’ perceptions in this regard matched those of the teachers, as they also emphasized the importance of social confidence and overall well-being in physical classrooms, where the teachers’ assessment was comparatively low. These patterns provide analytical insights that are decisively valuable for designing AI-integrated pedagogical models that support social development within the educational environments. Full article
(This article belongs to the Special Issue Artificial Intelligence and Educational Psychology)
Show Figures

Figure 1

23 pages, 4770 KB  
Article
Multidimensional Street View Representation and Association Analysis for Exploring Human Subjective Perception Differences in East Asian and European Cities
by Shaojun Liu, Shaonan Zhu, Weitao Li, Yongbang Li and Yuting Dai
Land 2025, 14(12), 2343; https://doi.org/10.3390/land14122343 - 28 Nov 2025
Viewed by 424
Abstract
Urban landscapes exhibit significant regional differences shaped by geography, history, and culture, yet how these variations influence human perception remains underexplored. This study investigates the impact of street scene characteristics on human perceptions in East Asian and European cities by analyzing the large-scale [...] Read more.
Urban landscapes exhibit significant regional differences shaped by geography, history, and culture, yet how these variations influence human perception remains underexplored. This study investigates the impact of street scene characteristics on human perceptions in East Asian and European cities by analyzing the large-scale MIT Place Pulse 2.0 dataset. We employ DeepLab v3+ and Mask R-CNN to extract multidimensional physical and visual features and utilize logistic regression to model their association with six subjective perceptions. The findings reveal significant cultural differences: streets in East Asian cities are characterized by higher compactness and brightness, whereas European city streets exhibit greater levels of greening and openness. While perceptions of aesthetics and liveliness show cross-cultural consistency, the mechanisms influencing safety and wealth perceptions diverge significantly; for instance, East Asian cities associate safety with road openness, while European cities favor greater enclosure. The study provides practical insights for creating urban environments that resonate with local cultural identities, enhancing well-being and supporting sustainable urban development. Full article
Show Figures

Figure 1

Back to TopTop