Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (230)

Search Parameters:
Keywords = instance perception

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 3342 KB  
Article
Urban Flood Severity and Residents’ Participation in Disaster Relief: Evidence from Zhengzhou, China
by Mengmeng Zhang, Chenyu Zhang and Zimingdian Wang
Appl. Sci. 2025, 15(19), 10565; https://doi.org/10.3390/app151910565 - 30 Sep 2025
Viewed by 295
Abstract
As global climate change intensifies the frequency of extreme weather events, urban flood control and disaster reduction efforts face unprecedented challenges. With the limitations of traditional, top-down emergency management becoming increasingly apparent, many countries are actively incorporating community-based participation into flood risk governance. [...] Read more.
As global climate change intensifies the frequency of extreme weather events, urban flood control and disaster reduction efforts face unprecedented challenges. With the limitations of traditional, top-down emergency management becoming increasingly apparent, many countries are actively incorporating community-based participation into flood risk governance. While research in this area is expanding, the specific impact of urban flood inundation severity on residents’ participation in relief efforts remains significantly underexplored. To address this research gap, this study employs the Community Capitals Framework (CCF) and a Gradient Boosting Decision Tree (GBDT) model to empirically analyze 1322 survey responses from Zhengzhou, China, exploring the non-linear relationship between flood severity and public participation. Our findings are threefold: (1) As the most direct source of residents’ risk perception, flood inundation severity has a significant association with their participation level. (2) This relationship is distinctly non-linear. For instance, inundation severity within a 200 m radius of a resident’s home shows a predominantly negative relation with participation level, with the negative effect lessening at extreme levels of inundation. The distance from inundated areas, conversely, exhibits an “S-shaped” curve. (3) Flood severity exhibits a significant reinforcement interaction with both communication technology levels and government organizational mobilization. This indicates that, during public crises like flash floods, robust information channels and effective organizational support are positively related to residents’ transition from passive to active participation. This study reveals the complex, non-linear associations between flood severity and civic engagement, providing theoretical support and practical insights for optimizing disaster policies and enhancing community resilience within the broader context of urban land management and sustainable development. Full article
(This article belongs to the Special Issue Human Geography in an Uncertain World: Challenges and Solutions)
Show Figures

Figure 1

26 pages, 10666 KB  
Article
FALS-YOLO: An Efficient and Lightweight Method for Automatic Brain Tumor Detection and Segmentation
by Liyan Sun, Linxuan Zheng and Yi Xin
Sensors 2025, 25(19), 5993; https://doi.org/10.3390/s25195993 - 28 Sep 2025
Viewed by 641
Abstract
Brain tumors are highly malignant diseases that severely threaten the nervous system and patients’ lives. MRI is a core technology for brain tumor diagnosis and treatment due to its high resolution and non-invasiveness. However, existing YOLO-based models face challenges in brain tumor MRI [...] Read more.
Brain tumors are highly malignant diseases that severely threaten the nervous system and patients’ lives. MRI is a core technology for brain tumor diagnosis and treatment due to its high resolution and non-invasiveness. However, existing YOLO-based models face challenges in brain tumor MRI image detection and segmentation, such as insufficient multi-scale feature extraction and high computational resource consumption. This paper proposes an improved lightweight brain tumor detection and instance segmentation model named FALS-YOLO, based on YOLOv8n-Seg and integrating three key modules: FLRDown, AdaSimAM, and LSCSHN. FLRDown enhances multi-scale tumor perception, AdaSimAM suppresses noise and improves feature fusion, and LSCSHN achieves high-precision segmentation with reduced parameters and computational burden. Experiments on the tumor-otak dataset show that FALS-YOLO achieves Precision (B) of 0.892, Recall (B) of 0.858, mAP@0.5 (B) of 0.912 in detection, and Precision (M) of 0.899, Recall (M) of 0.863, mAP@0.5 (M) of 0.917 in segmentation, outperforming YOLOv5n-Seg, YOLOv8n-Seg, YOLOv9s-Seg, YOLOv10n-Seg and YOLOv11n-Seg. Compared with YOLOv8n-Seg, FALS-YOLO reduces parameters by 31.95%, computational amount by 20.00%, and model size by 32.31%. It provides an efficient, accurate and practical solution for the automatic detection and instance segmentation of brain tumors in resource-limited environments. Full article
(This article belongs to the Special Issue Emerging MRI Techniques for Enhanced Disease Diagnosis and Monitoring)
Show Figures

Figure 1

23 pages, 5234 KB  
Article
Instance Segmentation of LiDAR Point Clouds with Local Perception and Channel Similarity
by Xinmiao Du and Xihong Wu
Remote Sens. 2025, 17(18), 3239; https://doi.org/10.3390/rs17183239 - 19 Sep 2025
Viewed by 610
Abstract
Lidar point clouds are crucial for autonomous driving, but their sparsity and scale variations pose challenges for instance segmentation. In this paper, we propose LCPSNet, a Light Detection and Ranging (LiDAR) channel-aware point segmentation network designed to handle distance-dependent sparsity and scale variation [...] Read more.
Lidar point clouds are crucial for autonomous driving, but their sparsity and scale variations pose challenges for instance segmentation. In this paper, we propose LCPSNet, a Light Detection and Ranging (LiDAR) channel-aware point segmentation network designed to handle distance-dependent sparsity and scale variation in point clouds. A top-down FPN is adopted, where high-level features are progressively upsampled and fused with shallow layers. The fused features at 1/16, 1/8, and 1/4 are further aligned to a common BEV/polar grid and processed by the Local Perception Module (LPM), which applies cross-scale, position-dependent weighting to enhance intra-object coherence and suppress interference. The Inter-Channel Correlation Module (ICCM) employs ball queries to model spatial and channel correlations, computing an inter-channel similarity matrix to reduce redundancy and highlight valid features. Experiments on SemanticKITTI and Waymo show that LPM and ICCM effectively improve local feature refinement and global semantic consistency. LCPSNet achieves 70.9 PQ and 77.1 mIoU on SemanticKITTI, surpassing mainstream methods and reaching state-of-the-art performance. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

14 pages, 954 KB  
Article
A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction
by Lei Kang, Xuanshuo Fu, Mohamed Ali Souibgui, Andrey Barsky, Lluis Gomez, Javier Vazquez-Corral, Alicia Fornés, Ernest Valveny and Dimosthenis Karatzas
Mathematics 2025, 13(17), 2851; https://doi.org/10.3390/math13172851 - 4 Sep 2025
Viewed by 608
Abstract
Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly [...] Read more.
Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

25 pages, 487 KB  
Review
Deformable and Fragile Object Manipulation: A Review and Prospects
by Yicheng Zhu, David Yang and Yangming Lee
Sensors 2025, 25(17), 5430; https://doi.org/10.3390/s25175430 - 2 Sep 2025
Viewed by 1529
Abstract
Deformable object manipulation (DOM) is a primary bottleneck for the real-world application of autonomous robots, requiring advanced frameworks for sensing, perception, modeling, planning, and control. When fragile objects such as soft tissues or fruits are involved, ensuring safety becomes the paramount concern, fundamentally [...] Read more.
Deformable object manipulation (DOM) is a primary bottleneck for the real-world application of autonomous robots, requiring advanced frameworks for sensing, perception, modeling, planning, and control. When fragile objects such as soft tissues or fruits are involved, ensuring safety becomes the paramount concern, fundamentally altering the manipulation problem from one of pure trajectory optimization to one of constrained optimization and real-time adaptive control. Existing DOM methodologies, however, often fall short of addressing fragility constraints as a core design feature, leading to significant gaps in real-time adaptiveness and generalization. This review systematically examines individual components in DOM with a focus on their effectiveness in handling fragile objects. We identified key limitations in current approaches and, based on this analysis, discussed a promising framework that utilizes both low-latency reflexive mechanisms and global optimization to dynamically adapt to specific object instances. Full article
(This article belongs to the Special Issue Advanced Robotic Manipulators and Control Applications)
Show Figures

Figure 1

21 pages, 12646 KB  
Article
A Vision-Based Information Processing Framework for Vineyard Grape Picking Using Two-Stage Segmentation and Morphological Perception
by Yifei Peng, Jun Sun, Zhaoqi Wu, Jinye Gao, Lei Shi and Zhiyan Shi
Horticulturae 2025, 11(9), 1039; https://doi.org/10.3390/horticulturae11091039 - 2 Sep 2025
Viewed by 515
Abstract
To achieve efficient vineyard grape picking, a vision-based information processing framework integrating two-stage segmentation with morphological perception is proposed. In the first stage, an improved YOLOv8s-seg model is employed for coarse segmentation, incorporating two key enhancements: first, a dynamic deformation feature aggregation module [...] Read more.
To achieve efficient vineyard grape picking, a vision-based information processing framework integrating two-stage segmentation with morphological perception is proposed. In the first stage, an improved YOLOv8s-seg model is employed for coarse segmentation, incorporating two key enhancements: first, a dynamic deformation feature aggregation module (DDFAM), which facilitates the extraction of complex structural and morphological features; and second, an efficient asymmetric decoupled head (EADHead), which improves boundary awareness while reducing parameter redundancy. Compared with mainstream segmentation models, the improved model achieves superior performance, attaining the highest mAP@0.5 of 86.75%, a lightweight structure with 10.34 M parameters, and a real-time inference speed of 10.02 ms per image. In the second stage, the fine segmentation of fruit stems is performed using an improved OTSU thresholding algorithm, which is applied to a single-channel image derived from the hue component of the HSV color space, thereby enhancing robustness under complex lighting conditions. Morphological features extracted from the preprocessed fruit stem, including centroid coordinates and a skeleton constructed via medial axis transform (MAT), are further utilized to establish the spatial relationships with a picking point and cutting axis. The visualization analysis confirms the high feasibility and adaptability of the proposed framework, providing essential technical support for the automation of grape harvesting. Full article
Show Figures

Figure 1

16 pages, 11354 KB  
Article
MTC-BEV: Semantic-Guided Temporal and Cross-Modal BEV Feature Fusion for 3D Object Detection
by Qiankai Xi, Li Ma, Jikai Zhang, Hongying Bai and Zhixing Wang
World Electr. Veh. J. 2025, 16(9), 493; https://doi.org/10.3390/wevj16090493 - 1 Sep 2025
Viewed by 759
Abstract
We propose MTC-BEV, a novel multi-modal 3D object detection framework for autonomous driving that achieves robust and efficient perception by combining spatial, temporal, and semantic cues. MTC-BEV integrates image and LiDAR features in the Bird’s-Eye View (BEV) space, where heterogeneous modalities are aligned [...] Read more.
We propose MTC-BEV, a novel multi-modal 3D object detection framework for autonomous driving that achieves robust and efficient perception by combining spatial, temporal, and semantic cues. MTC-BEV integrates image and LiDAR features in the Bird’s-Eye View (BEV) space, where heterogeneous modalities are aligned and fused through the Bidirectional Cross-Modal Attention Fusion (BCAP) module with positional encodings. To model temporal consistency, the Temporal Fusion (TTFusion) module explicitly compensates for ego-motion and incorporates past BEV features. In addition, a segmentation-guided BEV enhancement projects 2D instance masks into BEV space, highlighting semantically informative regions. Experiments on the nuScenes dataset demonstrate that MTC-BEV achieves a nuScenes Detection Score (NDS) of 72.4% at 14.91 FPS, striking a favorable balance between accuracy and efficiency. These results confirm the effectiveness of the proposed design, highlighting the potential of semantic-guided cross-modal and temporal fusion for robust 3D object detection in autonomous driving. Full article
(This article belongs to the Special Issue Electric Vehicle Autonomous Driving Based on Image Recognition)
Show Figures

Figure 1

17 pages, 3054 KB  
Article
Building Instance Extraction via Multi-Scale Hybrid Dual-Attention Network
by Qingqing Hu, Yiran Peng, Chi Zhang, Yunqi Lin, KinTak U and Junming Chen
Buildings 2025, 15(17), 3102; https://doi.org/10.3390/buildings15173102 - 29 Aug 2025
Cited by 1 | Viewed by 449
Abstract
Accurate building instance segmentation from high-resolution remote sensing images remains challenging due to complex urban scenes featuring occlusions, irregular building shapes, and heterogeneous textures. To address these issues, we propose a novel Multi-Scale Hybrid Dual-Attention Network (MS-HDAN), which integrates a dual-stream encoder, multi-scale [...] Read more.
Accurate building instance segmentation from high-resolution remote sensing images remains challenging due to complex urban scenes featuring occlusions, irregular building shapes, and heterogeneous textures. To address these issues, we propose a novel Multi-Scale Hybrid Dual-Attention Network (MS-HDAN), which integrates a dual-stream encoder, multi-scale feature extraction, and a hybrid attention mechanism. Specifically, the encoder is designed with a Local Feature Extraction Pathway (LFEP) and a Global Context Modeling Pathway (GCMP), enabling simultaneous capture of structural details and long-range semantic dependencies. A Local-Global Collaborative Perception Enhancement Module (LG-CPEM) is introduced to fuse the outputs from both streams, enhancing contextual representation. The decoder adopts a hierarchical up-sampling structure with skip connections and incorporates a dual-attention module to refine boundary-level details and suppress background noise. Extensive experiments on benchmark urban building datasets demonstrate that MS-HDAN significantly outperforms existing state-of-the-art methods, particularly in handling densely distributed and structurally complex buildings. The proposed framework offers a robust and scalable solution for real-world applications, such as urban planning, where precise building segmentation is crucial. Full article
Show Figures

Figure 1

10 pages, 2952 KB  
Article
Weakly Supervised Monocular Fisheye Camera Distance Estimation with Segmentation Constraints
by Zhihao Zhang and Xuejun Yang
Electronics 2025, 14(17), 3429; https://doi.org/10.3390/electronics14173429 - 28 Aug 2025
Viewed by 547
Abstract
Monocular fisheye camera distance estimation is a crucial visual perception task for autonomous driving. Due to the practical challenges of acquiring precise depth annotations, existing self-supervised methods usually consist of a monocular distance model and an ego-motion predictor with the goal of minimizing [...] Read more.
Monocular fisheye camera distance estimation is a crucial visual perception task for autonomous driving. Due to the practical challenges of acquiring precise depth annotations, existing self-supervised methods usually consist of a monocular distance model and an ego-motion predictor with the goal of minimizing a reconstruction matching loss. However, they suffer from inaccurate distance estimation in low-texture regions, especially road surfaces. In this paper, we introduce a weakly supervised learning strategy that incorporates semantic segmentation, instance segmentation, and optical flow as additional sources of supervision. In addition to the self-supervised reconstruction loss, we introduce a road surface flatness loss, an instance smoothness loss, and an optical flow loss to enhance the accuracy of distance estimation. We evaluate the proposed method on the WoodScape and SynWoodScape datasets, and it outperforms the self-supervised monocular baseline, FisheyeDistanceNet. Full article
Show Figures

Figure 1

23 pages, 2914 KB  
Article
Analyzing Women’s Security in Public Transportation in Developing Countries: A Case Study of Lahore City
by Hina Saleemi, Saadia Tabassum, Muhammad Ashraf Javid, Nazam Ali, Giovanni Tesoriere and Tiziana Campisi
Safety 2025, 11(3), 82; https://doi.org/10.3390/safety11030082 - 26 Aug 2025
Viewed by 2060
Abstract
Security concerns regarding women in developing nations are frequently highlighted due to the prevalence of harassment incidents, particularly within public transportation systems. In Pakistan, where women make up half of the population, this issue persists in various forms of harassment, both within local [...] Read more.
Security concerns regarding women in developing nations are frequently highlighted due to the prevalence of harassment incidents, particularly within public transportation systems. In Pakistan, where women make up half of the population, this issue persists in various forms of harassment, both within local environments and public transportation systems. Therefore, this study aims to investigate the security challenges confronted by women within the public transportation system in the city of Lahore, Pssakistan. To achieve this, a user perception survey was designed to focus on women’s security during travel and relevant socioeconomic factors. The collected responses were analyzed using descriptive analysis and factor analysis methods. Exploratory factor analysis (EFA) revealed five latent variables, each encapsulating distinct aspects of women’s security within public transportation environments. Later on, a structural model of comfort of using public transportation at night was developed using the results of the exploratory factor analysis. Our study’s results propose that although many women express feeling safe during their travels, a prominent number have experienced instances of harassment. Generally, issues such as insufficient lighting during night travel and a lack of awareness about harassment come out as primary concerns within Lahore’s currently operated public transport. The structural model results revealed that the latent variables of harassment, harassment reaction, bus stop station facility, and public transportation safety are significant predictors of comfort of using public transportation at night, being statistically significant (p < 0.05). The findings emphasize the initiatives to reduce overcrowding, improve nighttime lighting and infrastructure, and strengthen awareness among users, along with prevention measures against harassment. This approach assures the females’ physical security and enhances the overall well-being and empowerment of women in urban surroundings. Full article
Show Figures

Figure 1

27 pages, 7285 KB  
Article
Towards Biologically-Inspired Visual SLAM in Dynamic Environments: IPL-SLAM with Instance Segmentation and Point-Line Feature Fusion
by Jian Liu, Donghao Yao, Na Liu and Ye Yuan
Biomimetics 2025, 10(9), 558; https://doi.org/10.3390/biomimetics10090558 - 22 Aug 2025
Viewed by 778
Abstract
Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. [...] Read more.
Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. In this paper, we define dynamic regions as areas in the scene containing moving objects, and dynamic features as the visual features extracted from these regions that may adversely affect localization accuracy. Inspired by biological perception strategies that integrate semantic awareness and geometric cues, we propose Instance-level Point-Line SLAM (IPL-SLAM), a robust visual SLAM framework for dynamic environments. The system employs YOLOv8-based instance segmentation to detect potential dynamic regions and construct semantic priors, while simultaneously extracting point and line features using Oriented FAST (Features from Accelerated Segment Test) and Rotated BRIEF (Binary Robust Independent Elementary Features), collectively known as ORB, and Line Segment Detector (LSD) algorithms. Motion consistency checks and angular deviation analysis are applied to filter dynamic features, and pose optimization is conducted using an adaptive-weight error function. A static semantic point cloud map is further constructed to enhance scene understanding. Experimental results on the TUM RGB-D dataset demonstrate that IPL-SLAM significantly outperforms existing dynamic SLAM systems—including DS-SLAM and ORB-SLAM2—in terms of trajectory accuracy and robustness in complex indoor environments. Full article
(This article belongs to the Section Biomimetic Design, Constructions and Devices)
Show Figures

Figure 1

22 pages, 627 KB  
Article
Social Capital Heterogeneity: Examining Farmer and Rancher Views About Climate Change Through Their Values and Network Diversity
by Michael Carolan
Agriculture 2025, 15(16), 1749; https://doi.org/10.3390/agriculture15161749 - 15 Aug 2025
Viewed by 677
Abstract
Agriculture plays a crucial role in discussions about environmental challenges because of its ecological footprint and high vulnerability to environmental shocks. To better understand the social and behavioral dynamics among food producers and their perceptions of climate change-related risks, this paper draws on [...] Read more.
Agriculture plays a crucial role in discussions about environmental challenges because of its ecological footprint and high vulnerability to environmental shocks. To better understand the social and behavioral dynamics among food producers and their perceptions of climate change-related risks, this paper draws on forty-one in-depth, semi-structured interviews with farmers and ranchers in Colorado (USA). Leveraging the concept of social capital, the paper extends the concept analytically in a direction missed by previous research highlighting network structures, such as by focusing on its bonding, bridging, and linking characteristics. Instead, focus centers on the inclusiveness and diversity of values, beliefs, worldviews, and cultural orientations within those networks, arguing that these elements can be just as influential, if not more so in certain instances, than structural qualities. The concept of social capital heterogeneity is introduced to describe a network’s level of diversity and inclusivity. The findings do not question the importance of studying network structures when trying to understand how food producers respond to threats like climate change; an approach that remains useful for explaining social learning, technology adoption, and behavioral change. However, this method misses elements captured through a subjective, interpretivist perspective. With social capital heterogeneity, we can use social capital to explore why farmers and ranchers hold specific values and risk perceptions, peering deeper “within” networks, while tools like quantitative social network analysis software help map their structures from the “outside.” Additionally, social capital heterogeneity provides valuable insights into questions about “effective” agro-environmental governance. The paper concludes by discussing practical implications of the findings and reviewing the limitations of the research design. Full article
Show Figures

Figure 1

32 pages, 9674 KB  
Article
A Spatiotemporal Multimodal Framework for Air Pollution Prediction Based on Bayesian Optimization—Evidence from Sichuan, China
by Fengfan Zhang, Jiabei Hu and Ming Zeng
Atmosphere 2025, 16(8), 958; https://doi.org/10.3390/atmos16080958 - 11 Aug 2025
Viewed by 882
Abstract
In regions characterized by complex terrain and diverse pollution sources, high-precision air pollution prediction remains challenging due to nonlinear spatiotemporal coupling and the difficulty of modeling local pollutant agglomeration. To address these issues, this study proposes a CNN–LSTM–Transformer multimodal prediction framework integrated with [...] Read more.
In regions characterized by complex terrain and diverse pollution sources, high-precision air pollution prediction remains challenging due to nonlinear spatiotemporal coupling and the difficulty of modeling local pollutant agglomeration. To address these issues, this study proposes a CNN–LSTM–Transformer multimodal prediction framework integrated with Bayesian Optimization. First, the Local Moran’s Index (LMI) is introduced as a spatial perception feature and concatenated with pollutant concentration sequences before being input into the CNN module. This design enhances the model’s ability to identify local pollutant clustering and spatial heterogeneity. Second, the LSTM architecture adopts a dual-channel structure: the main channel employs bidirectional LSTM to extract temporal dependencies, while the auxiliary channel uses unidirectional LSTM to capture evolutionary trends. A Transformer with a multi-head attention mechanism is then introduced to perform global modeling. Bayesian Optimization is employed to automatically adjust key hyperparameters, thereby improving the model’s stability and convergence efficiency. Empirical results based on atmospheric pollution monitoring data from Sichuan Province during 2021–2024 demonstrate that the proposed model outperforms various mainstream methods in predicting six pollutants in Chengdu. For instance, the MAE for PM2.5 decreased by 14.9–22.1%, while the coefficient of determination (R2) remained stable between 87% and 89%. The accuracy decay rate across four-day forecasts was controlled within 12.4%. Furthermore, in PM2.5 generalization prediction tasks across four other cities—Yibin, Zigong, Nanchong, and Mianyang—the model exhibited superior stability and robustness, achieving an average R2 of 87.4%. These findings highlight the model’s long-term stability and regional generalization capability, offering reliable technical support for air pollution prediction and control strategies in Sichuan Province and potentially beyond. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Atmospheric Sciences)
Show Figures

Figure 1

24 pages, 10715 KB  
Article
Deep Learning Empowers Smart Animal Husbandry: Precise Localization and Image Segmentation of Specific Parts of Sika Deer
by Caocan Zhu, Jinfan Wei, Tonghe Liu, He Gong, Juanjuan Fan and Tianli Hu
Agriculture 2025, 15(16), 1719; https://doi.org/10.3390/agriculture15161719 - 9 Aug 2025
Viewed by 649
Abstract
In precision livestock farming, synchronous and high-precision instance segmentation of multiple key body parts of sika deer serves as the core visual foundation for achieving automated health monitoring, behavior analysis, and automated antler collection. However, in real-world breeding environments, factors such as lighting [...] Read more.
In precision livestock farming, synchronous and high-precision instance segmentation of multiple key body parts of sika deer serves as the core visual foundation for achieving automated health monitoring, behavior analysis, and automated antler collection. However, in real-world breeding environments, factors such as lighting changes, severe individual occlusion, pose diversity, and small targets pose severe challenges to the accuracy and robustness of existing segmentation models. To address these challenges, this study proposes an improved model, MPDF-DetSeg, based on YOLO11-seg. The model reconstructs its neck network, and designs the multipath diversion feature fusion pyramid network (MPDFPN). The multipath feature fusion and cross-scale interaction mechanism are used to solve the segmentation ambiguity problem of deer body occlusion and complex illumination. The design depth separable extended residual module (DWEResBlock) improves the ability to express details such as texture in specific parts of sika deer. Moreover, we adopt the MPDIoU loss function based on vertex geometry constraints to optimize the positioning accuracy of tilted targets. In this study, a dataset consisting of 1036 sika deer images was constructed, covering five categories, including antlers, heads (front/side views), and legs (front/rear legs), and used for method validation. Compared with the original YOLO11-seg model, the improved model made significant progress in several indicators: the mAP50 and mAP50-95 under the bounding-box metrics increased by 2.1% and 4.9% respectively; the mAP50 and mAP50-95 under the mask metrics increased by 2.4% and 5.3%, respectively. In addition, in the mIoU index of image segmentation, the model reached 70.1%, showing the superiority of this method in the accurate detection and segmentation of specific parts of sika deer, this provides an effective and robust technical solution for realizing the multidimensional intelligent perception and automated applications of sika deer. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

29 pages, 3842 KB  
Article
SABE-YOLO: Structure-Aware and Boundary-Enhanced YOLO for Weld Seam Instance Segmentation
by Rui Wen, Wu Xie, Yong Fan and Lanlan Shen
J. Imaging 2025, 11(8), 262; https://doi.org/10.3390/jimaging11080262 - 6 Aug 2025
Cited by 1 | Viewed by 721
Abstract
Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, [...] Read more.
Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, existing approaches still face significant challenges in boundary perception and structural representation. Due to the inherently elongated shapes, complex geometries, and blurred edges of weld seams, current segmentation models often struggle to maintain high accuracy in practical applications. To address this issue, a novel structure-aware and boundary-enhanced YOLO (SABE-YOLO) is proposed for weld seam instance segmentation. First, a Structure-Aware Fusion Module (SAFM) is designed to enhance structural feature representation through strip pooling attention and element-wise multiplicative fusion, targeting the difficulty in extracting elongated and complex features. Second, a C2f-based Boundary-Enhanced Aggregation Module (C2f-BEAM) is constructed to improve edge feature sensitivity by integrating multi-scale boundary detail extraction, feature aggregation, and attention mechanisms. Finally, the inner minimum point distance-based intersection over union (Inner-MPDIoU) is introduced to improve localization accuracy for weld seam regions. Experimental results on the self-built weld seam image dataset show that SABE-YOLO outperforms YOLOv8n-Seg by 3 percentage points in the AP(50–95) metric, reaching 46.3%. Meanwhile, it maintains a low computational cost (18.3 GFLOPs) and a small number of parameters (6.6M), while achieving an inference speed of 127 FPS, demonstrating a favorable trade-off between segmentation accuracy and computational efficiency. The proposed method provides an effective solution for high-precision visual perception of complex weld seam structures and demonstrates strong potential for industrial application. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

Back to TopTop