MDPI - Publisher of Open Access Journals

21 pages, 3342 KB

Open AccessArticle

Urban Flood Severity and Residents’ Participation in Disaster Relief: Evidence from Zhengzhou, China

by Mengmeng Zhang, Chenyu Zhang and Zimingdian Wang

Appl. Sci. 2025, 15(19), 10565; https://doi.org/10.3390/app151910565 - 30 Sep 2025

Viewed by 295

As global climate change intensifies the frequency of extreme weather events, urban flood control and disaster reduction efforts face unprecedented challenges. With the limitations of traditional, top-down emergency management becoming increasingly apparent, many countries are actively incorporating community-based participation into flood risk governance. [...] Read more.

As global climate change intensifies the frequency of extreme weather events, urban flood control and disaster reduction efforts face unprecedented challenges. With the limitations of traditional, top-down emergency management becoming increasingly apparent, many countries are actively incorporating community-based participation into flood risk governance. While research in this area is expanding, the specific impact of urban flood inundation severity on residents’ participation in relief efforts remains significantly underexplored. To address this research gap, this study employs the Community Capitals Framework (CCF) and a Gradient Boosting Decision Tree (GBDT) model to empirically analyze 1322 survey responses from Zhengzhou, China, exploring the non-linear relationship between flood severity and public participation. Our findings are threefold: (1) As the most direct source of residents’ risk perception, flood inundation severity has a significant association with their participation level. (2) This relationship is distinctly non-linear. For instance, inundation severity within a 200 m radius of a resident’s home shows a predominantly negative relation with participation level, with the negative effect lessening at extreme levels of inundation. The distance from inundated areas, conversely, exhibits an “S-shaped” curve. (3) Flood severity exhibits a significant reinforcement interaction with both communication technology levels and government organizational mobilization. This indicates that, during public crises like flash floods, robust information channels and effective organizational support are positively related to residents’ transition from passive to active participation. This study reveals the complex, non-linear associations between flood severity and civic engagement, providing theoretical support and practical insights for optimizing disaster policies and enhancing community resilience within the broader context of urban land management and sustainable development. Full article

(This article belongs to the Special Issue Human Geography in an Uncertain World: Challenges and Solutions)

► Show Figures

Figure 1

26 pages, 10666 KB

Open AccessArticle

FALS-YOLO: An Efficient and Lightweight Method for Automatic Brain Tumor Detection and Segmentation

by Liyan Sun, Linxuan Zheng and Yi Xin

Sensors 2025, 25(19), 5993; https://doi.org/10.3390/s25195993 - 28 Sep 2025

Viewed by 641

Abstract

Brain tumors are highly malignant diseases that severely threaten the nervous system and patients’ lives. MRI is a core technology for brain tumor diagnosis and treatment due to its high resolution and non-invasiveness. However, existing YOLO-based models face challenges in brain tumor MRI [...] Read more.

Brain tumors are highly malignant diseases that severely threaten the nervous system and patients’ lives. MRI is a core technology for brain tumor diagnosis and treatment due to its high resolution and non-invasiveness. However, existing YOLO-based models face challenges in brain tumor MRI image detection and segmentation, such as insufficient multi-scale feature extraction and high computational resource consumption. This paper proposes an improved lightweight brain tumor detection and instance segmentation model named FALS-YOLO, based on YOLOv8n-Seg and integrating three key modules: FLRDown, AdaSimAM, and LSCSHN. FLRDown enhances multi-scale tumor perception, AdaSimAM suppresses noise and improves feature fusion, and LSCSHN achieves high-precision segmentation with reduced parameters and computational burden. Experiments on the tumor-otak dataset show that FALS-YOLO achieves Precision (B) of 0.892, Recall (B) of 0.858, mAP@0.5 (B) of 0.912 in detection, and Precision (M) of 0.899, Recall (M) of 0.863, mAP@0.5 (M) of 0.917 in segmentation, outperforming YOLOv5n-Seg, YOLOv8n-Seg, YOLOv9s-Seg, YOLOv10n-Seg and YOLOv11n-Seg. Compared with YOLOv8n-Seg, FALS-YOLO reduces parameters by 31.95%, computational amount by 20.00%, and model size by 32.31%. It provides an efficient, accurate and practical solution for the automatic detection and instance segmentation of brain tumors in resource-limited environments. Full article

(This article belongs to the Special Issue Emerging MRI Techniques for Enhanced Disease Diagnosis and Monitoring)

► Show Figures

Figure 1

23 pages, 5234 KB

Open AccessArticle

Instance Segmentation of LiDAR Point Clouds with Local Perception and Channel Similarity

by Xinmiao Du and Xihong Wu

Remote Sens. 2025, 17(18), 3239; https://doi.org/10.3390/rs17183239 - 19 Sep 2025

Viewed by 610

Abstract

Lidar point clouds are crucial for autonomous driving, but their sparsity and scale variations pose challenges for instance segmentation. In this paper, we propose LCPSNet, a Light Detection and Ranging (LiDAR) channel-aware point segmentation network designed to handle distance-dependent sparsity and scale variation [...] Read more.

Lidar point clouds are crucial for autonomous driving, but their sparsity and scale variations pose challenges for instance segmentation. In this paper, we propose LCPSNet, a Light Detection and Ranging (LiDAR) channel-aware point segmentation network designed to handle distance-dependent sparsity and scale variation in point clouds. A top-down FPN is adopted, where high-level features are progressively upsampled and fused with shallow layers. The fused features at 1/16, 1/8, and 1/4 are further aligned to a common BEV/polar grid and processed by the Local Perception Module (LPM), which applies cross-scale, position-dependent weighting to enhance intra-object coherence and suppress interference. The Inter-Channel Correlation Module (ICCM) employs ball queries to model spatial and channel correlations, computing an inter-channel similarity matrix to reduce redundancy and highlight valid features. Experiments on SemanticKITTI and Waymo show that LPM and ICCM effectively improve local feature refinement and global semantic consistency. LCPSNet achieves 70.9 PQ and 77.1 mIoU on SemanticKITTI, surpassing mainstream methods and reaching state-of-the-art performance. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

14 pages, 954 KB

Open AccessArticle

A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction

by Lei Kang, Xuanshuo Fu, Mohamed Ali Souibgui, Andrey Barsky, Lluis Gomez, Javier Vazquez-Corral, Alicia Fornés, Ernest Valveny and Dimosthenis Karatzas

Mathematics 2025, 13(17), 2851; https://doi.org/10.3390/math13172851 - 4 Sep 2025

Viewed by 608

Abstract

Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly [...] Read more.

Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

25 pages, 487 KB

Open AccessReview

Deformable and Fragile Object Manipulation: A Review and Prospects

by Yicheng Zhu, David Yang and Yangming Lee

Sensors 2025, 25(17), 5430; https://doi.org/10.3390/s25175430 - 2 Sep 2025

Viewed by 1529

Abstract

Deformable object manipulation (DOM) is a primary bottleneck for the real-world application of autonomous robots, requiring advanced frameworks for sensing, perception, modeling, planning, and control. When fragile objects such as soft tissues or fruits are involved, ensuring safety becomes the paramount concern, fundamentally [...] Read more.

Deformable object manipulation (DOM) is a primary bottleneck for the real-world application of autonomous robots, requiring advanced frameworks for sensing, perception, modeling, planning, and control. When fragile objects such as soft tissues or fruits are involved, ensuring safety becomes the paramount concern, fundamentally altering the manipulation problem from one of pure trajectory optimization to one of constrained optimization and real-time adaptive control. Existing DOM methodologies, however, often fall short of addressing fragility constraints as a core design feature, leading to significant gaps in real-time adaptiveness and generalization. This review systematically examines individual components in DOM with a focus on their effectiveness in handling fragile objects. We identified key limitations in current approaches and, based on this analysis, discussed a promising framework that utilizes both low-latency reflexive mechanisms and global optimization to dynamically adapt to specific object instances. Full article

(This article belongs to the Special Issue Advanced Robotic Manipulators and Control Applications)

► Show Figures

Figure 1

21 pages, 12646 KB

Open AccessArticle

A Vision-Based Information Processing Framework for Vineyard Grape Picking Using Two-Stage Segmentation and Morphological Perception

by Yifei Peng, Jun Sun, Zhaoqi Wu, Jinye Gao, Lei Shi and Zhiyan Shi

Horticulturae 2025, 11(9), 1039; https://doi.org/10.3390/horticulturae11091039 - 2 Sep 2025

Viewed by 515

Abstract

To achieve efficient vineyard grape picking, a vision-based information processing framework integrating two-stage segmentation with morphological perception is proposed. In the first stage, an improved YOLOv8s-seg model is employed for coarse segmentation, incorporating two key enhancements: first, a dynamic deformation feature aggregation module [...] Read more.

To achieve efficient vineyard grape picking, a vision-based information processing framework integrating two-stage segmentation with morphological perception is proposed. In the first stage, an improved YOLOv8s-seg model is employed for coarse segmentation, incorporating two key enhancements: first, a dynamic deformation feature aggregation module (DDFAM), which facilitates the extraction of complex structural and morphological features; and second, an efficient asymmetric decoupled head (EADHead), which improves boundary awareness while reducing parameter redundancy. Compared with mainstream segmentation models, the improved model achieves superior performance, attaining the highest mAP@0.5 of 86.75%, a lightweight structure with 10.34 M parameters, and a real-time inference speed of 10.02 ms per image. In the second stage, the fine segmentation of fruit stems is performed using an improved OTSU thresholding algorithm, which is applied to a single-channel image derived from the hue component of the HSV color space, thereby enhancing robustness under complex lighting conditions. Morphological features extracted from the preprocessed fruit stem, including centroid coordinates and a skeleton constructed via medial axis transform (MAT), are further utilized to establish the spatial relationships with a picking point and cutting axis. The visualization analysis confirms the high feasibility and adaptability of the proposed framework, providing essential technical support for the automation of grape harvesting. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in the Processing of Horticultural Crops)

► Show Figures

Figure 1

16 pages, 11354 KB

Open AccessArticle

MTC-BEV: Semantic-Guided Temporal and Cross-Modal BEV Feature Fusion for 3D Object Detection

by Qiankai Xi, Li Ma, Jikai Zhang, Hongying Bai and Zhixing Wang

World Electr. Veh. J. 2025, 16(9), 493; https://doi.org/10.3390/wevj16090493 - 1 Sep 2025

Viewed by 759

Abstract

We propose MTC-BEV, a novel multi-modal 3D object detection framework for autonomous driving that achieves robust and efficient perception by combining spatial, temporal, and semantic cues. MTC-BEV integrates image and LiDAR features in the Bird’s-Eye View (BEV) space, where heterogeneous modalities are aligned [...] Read more.

We propose MTC-BEV, a novel multi-modal 3D object detection framework for autonomous driving that achieves robust and efficient perception by combining spatial, temporal, and semantic cues. MTC-BEV integrates image and LiDAR features in the Bird’s-Eye View (BEV) space, where heterogeneous modalities are aligned and fused through the Bidirectional Cross-Modal Attention Fusion (BCAP) module with positional encodings. To model temporal consistency, the Temporal Fusion (TTFusion) module explicitly compensates for ego-motion and incorporates past BEV features. In addition, a segmentation-guided BEV enhancement projects 2D instance masks into BEV space, highlighting semantically informative regions. Experiments on the nuScenes dataset demonstrate that MTC-BEV achieves a nuScenes Detection Score (NDS) of 72.4% at 14.91 FPS, striking a favorable balance between accuracy and efficiency. These results confirm the effectiveness of the proposed design, highlighting the potential of semantic-guided cross-modal and temporal fusion for robust 3D object detection in autonomous driving. Full article

(This article belongs to the Special Issue Electric Vehicle Autonomous Driving Based on Image Recognition)

► Show Figures

Figure 1

17 pages, 3054 KB

Open AccessArticle

Building Instance Extraction via Multi-Scale Hybrid Dual-Attention Network

by Qingqing Hu, Yiran Peng, Chi Zhang, Yunqi Lin, KinTak U and Junming Chen

Buildings 2025, 15(17), 3102; https://doi.org/10.3390/buildings15173102 - 29 Aug 2025

Cited by 1 | Viewed by 449

Abstract

Accurate building instance segmentation from high-resolution remote sensing images remains challenging due to complex urban scenes featuring occlusions, irregular building shapes, and heterogeneous textures. To address these issues, we propose a novel Multi-Scale Hybrid Dual-Attention Network (MS-HDAN), which integrates a dual-stream encoder, multi-scale [...] Read more.

Accurate building instance segmentation from high-resolution remote sensing images remains challenging due to complex urban scenes featuring occlusions, irregular building shapes, and heterogeneous textures. To address these issues, we propose a novel Multi-Scale Hybrid Dual-Attention Network (MS-HDAN), which integrates a dual-stream encoder, multi-scale feature extraction, and a hybrid attention mechanism. Specifically, the encoder is designed with a Local Feature Extraction Pathway (LFEP) and a Global Context Modeling Pathway (GCMP), enabling simultaneous capture of structural details and long-range semantic dependencies. A Local-Global Collaborative Perception Enhancement Module (LG-CPEM) is introduced to fuse the outputs from both streams, enhancing contextual representation. The decoder adopts a hierarchical up-sampling structure with skip connections and incorporates a dual-attention module to refine boundary-level details and suppress background noise. Extensive experiments on benchmark urban building datasets demonstrate that MS-HDAN significantly outperforms existing state-of-the-art methods, particularly in handling densely distributed and structurally complex buildings. The proposed framework offers a robust and scalable solution for real-world applications, such as urban planning, where precise building segmentation is crucial. Full article

(This article belongs to the Special Issue Practice and Application of Artificial Intelligence in Urban Decision-Making)

► Show Figures

Figure 1

10 pages, 2952 KB

Open AccessArticle

Weakly Supervised Monocular Fisheye Camera Distance Estimation with Segmentation Constraints

by Zhihao Zhang and Xuejun Yang

Electronics 2025, 14(17), 3429; https://doi.org/10.3390/electronics14173429 - 28 Aug 2025

Viewed by 547

Abstract

Monocular fisheye camera distance estimation is a crucial visual perception task for autonomous driving. Due to the practical challenges of acquiring precise depth annotations, existing self-supervised methods usually consist of a monocular distance model and an ego-motion predictor with the goal of minimizing [...] Read more.

Monocular fisheye camera distance estimation is a crucial visual perception task for autonomous driving. Due to the practical challenges of acquiring precise depth annotations, existing self-supervised methods usually consist of a monocular distance model and an ego-motion predictor with the goal of minimizing a reconstruction matching loss. However, they suffer from inaccurate distance estimation in low-texture regions, especially road surfaces. In this paper, we introduce a weakly supervised learning strategy that incorporates semantic segmentation, instance segmentation, and optical flow as additional sources of supervision. In addition to the self-supervised reconstruction loss, we introduce a road surface flatness loss, an instance smoothness loss, and an optical flow loss to enhance the accuracy of distance estimation. We evaluate the proposed method on the WoodScape and SynWoodScape datasets, and it outperforms the self-supervised monocular baseline, FisheyeDistanceNet. Full article

► Show Figures

Figure 1

23 pages, 2914 KB

Open AccessArticle

Analyzing Women’s Security in Public Transportation in Developing Countries: A Case Study of Lahore City

by Hina Saleemi, Saadia Tabassum, Muhammad Ashraf Javid, Nazam Ali, Giovanni Tesoriere and Tiziana Campisi

Safety 2025, 11(3), 82; https://doi.org/10.3390/safety11030082 - 26 Aug 2025

Viewed by 2060

Abstract

Security concerns regarding women in developing nations are frequently highlighted due to the prevalence of harassment incidents, particularly within public transportation systems. In Pakistan, where women make up half of the population, this issue persists in various forms of harassment, both within local [...] Read more.

Security concerns regarding women in developing nations are frequently highlighted due to the prevalence of harassment incidents, particularly within public transportation systems. In Pakistan, where women make up half of the population, this issue persists in various forms of harassment, both within local environments and public transportation systems. Therefore, this study aims to investigate the security challenges confronted by women within the public transportation system in the city of Lahore, Pssakistan. To achieve this, a user perception survey was designed to focus on women’s security during travel and relevant socioeconomic factors. The collected responses were analyzed using descriptive analysis and factor analysis methods. Exploratory factor analysis (EFA) revealed five latent variables, each encapsulating distinct aspects of women’s security within public transportation environments. Later on, a structural model of comfort of using public transportation at night was developed using the results of the exploratory factor analysis. Our study’s results propose that although many women express feeling safe during their travels, a prominent number have experienced instances of harassment. Generally, issues such as insufficient lighting during night travel and a lack of awareness about harassment come out as primary concerns within Lahore’s currently operated public transport. The structural model results revealed that the latent variables of harassment, harassment reaction, bus stop station facility, and public transportation safety are significant predictors of comfort of using public transportation at night, being statistically significant (p < 0.05). The findings emphasize the initiatives to reduce overcrowding, improve nighttime lighting and infrastructure, and strengthen awareness among users, along with prevention measures against harassment. This approach assures the females’ physical security and enhances the overall well-being and empowerment of women in urban surroundings. Full article

► Show Figures

Figure 1

27 pages, 7285 KB

Open AccessArticle

Towards Biologically-Inspired Visual SLAM in Dynamic Environments: IPL-SLAM with Instance Segmentation and Point-Line Feature Fusion

by Jian Liu, Donghao Yao, Na Liu and Ye Yuan

Biomimetics 2025, 10(9), 558; https://doi.org/10.3390/biomimetics10090558 - 22 Aug 2025

Viewed by 778

Abstract

Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. [...] Read more.

Simultaneous Localization and Mapping (SLAM) is a fundamental technique in mobile robotics, enabling autonomous navigation and environmental reconstruction. However, dynamic elements in real-world scenes—such as walking pedestrians, moving vehicles, and swinging doors—often degrade SLAM performance by introducing unreliable features that cause localization errors. In this paper, we define dynamic regions as areas in the scene containing moving objects, and dynamic features as the visual features extracted from these regions that may adversely affect localization accuracy. Inspired by biological perception strategies that integrate semantic awareness and geometric cues, we propose Instance-level Point-Line SLAM (IPL-SLAM), a robust visual SLAM framework for dynamic environments. The system employs YOLOv8-based instance segmentation to detect potential dynamic regions and construct semantic priors, while simultaneously extracting point and line features using Oriented FAST (Features from Accelerated Segment Test) and Rotated BRIEF (Binary Robust Independent Elementary Features), collectively known as ORB, and Line Segment Detector (LSD) algorithms. Motion consistency checks and angular deviation analysis are applied to filter dynamic features, and pose optimization is conducted using an adaptive-weight error function. A static semantic point cloud map is further constructed to enhance scene understanding. Experimental results on the TUM RGB-D dataset demonstrate that IPL-SLAM significantly outperforms existing dynamic SLAM systems—including DS-SLAM and ORB-SLAM2—in terms of trajectory accuracy and robustness in complex indoor environments. Full article

(This article belongs to the Section Biomimetic Design, Constructions and Devices)

► Show Figures

Figure 1

22 pages, 627 KB

Open AccessArticle

Social Capital Heterogeneity: Examining Farmer and Rancher Views About Climate Change Through Their Values and Network Diversity

by Michael Carolan

Agriculture 2025, 15(16), 1749; https://doi.org/10.3390/agriculture15161749 - 15 Aug 2025

Viewed by 677

Abstract

Agriculture plays a crucial role in discussions about environmental challenges because of its ecological footprint and high vulnerability to environmental shocks. To better understand the social and behavioral dynamics among food producers and their perceptions of climate change-related risks, this paper draws on [...] Read more.

Agriculture plays a crucial role in discussions about environmental challenges because of its ecological footprint and high vulnerability to environmental shocks. To better understand the social and behavioral dynamics among food producers and their perceptions of climate change-related risks, this paper draws on forty-one in-depth, semi-structured interviews with farmers and ranchers in Colorado (USA). Leveraging the concept of social capital, the paper extends the concept analytically in a direction missed by previous research highlighting network structures, such as by focusing on its bonding, bridging, and linking characteristics. Instead, focus centers on the inclusiveness and diversity of values, beliefs, worldviews, and cultural orientations within those networks, arguing that these elements can be just as influential, if not more so in certain instances, than structural qualities. The concept of social capital heterogeneity is introduced to describe a network’s level of diversity and inclusivity. The findings do not question the importance of studying network structures when trying to understand how food producers respond to threats like climate change; an approach that remains useful for explaining social learning, technology adoption, and behavioral change. However, this method misses elements captured through a subjective, interpretivist perspective. With social capital heterogeneity, we can use social capital to explore why farmers and ranchers hold specific values and risk perceptions, peering deeper “within” networks, while tools like quantitative social network analysis software help map their structures from the “outside.” Additionally, social capital heterogeneity provides valuable insights into questions about “effective” agro-environmental governance. The paper concludes by discussing practical implications of the findings and reviewing the limitations of the research design. Full article

(This article belongs to the Special Issue Agri-Environmental Risk Assessment and Management for Sustainable Rural Development)

► Show Figures

Figure 1

32 pages, 9674 KB

Open AccessArticle

A Spatiotemporal Multimodal Framework for Air Pollution Prediction Based on Bayesian Optimization—Evidence from Sichuan, China

by Fengfan Zhang, Jiabei Hu and Ming Zeng

Atmosphere 2025, 16(8), 958; https://doi.org/10.3390/atmos16080958 - 11 Aug 2025

Viewed by 882

Abstract

In regions characterized by complex terrain and diverse pollution sources, high-precision air pollution prediction remains challenging due to nonlinear spatiotemporal coupling and the difficulty of modeling local pollutant agglomeration. To address these issues, this study proposes a CNN–LSTM–Transformer multimodal prediction framework integrated with [...] Read more.

In regions characterized by complex terrain and diverse pollution sources, high-precision air pollution prediction remains challenging due to nonlinear spatiotemporal coupling and the difficulty of modeling local pollutant agglomeration. To address these issues, this study proposes a CNN–LSTM–Transformer multimodal prediction framework integrated with Bayesian Optimization. First, the Local Moran’s Index (LMI) is introduced as a spatial perception feature and concatenated with pollutant concentration sequences before being input into the CNN module. This design enhances the model’s ability to identify local pollutant clustering and spatial heterogeneity. Second, the LSTM architecture adopts a dual-channel structure: the main channel employs bidirectional LSTM to extract temporal dependencies, while the auxiliary channel uses unidirectional LSTM to capture evolutionary trends. A Transformer with a multi-head attention mechanism is then introduced to perform global modeling. Bayesian Optimization is employed to automatically adjust key hyperparameters, thereby improving the model’s stability and convergence efficiency. Empirical results based on atmospheric pollution monitoring data from Sichuan Province during 2021–2024 demonstrate that the proposed model outperforms various mainstream methods in predicting six pollutants in Chengdu. For instance, the MAE for PM_2.5 decreased by 14.9–22.1%, while the coefficient of determination (R²) remained stable between 87% and 89%. The accuracy decay rate across four-day forecasts was controlled within 12.4%. Furthermore, in PM_2.5 generalization prediction tasks across four other cities—Yibin, Zigong, Nanchong, and Mianyang—the model exhibited superior stability and robustness, achieving an average R² of 87.4%. These findings highlight the model’s long-term stability and regional generalization capability, offering reliable technical support for air pollution prediction and control strategies in Sichuan Province and potentially beyond. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Atmospheric Sciences)

► Show Figures

Figure 1

24 pages, 10715 KB

Open AccessArticle

Deep Learning Empowers Smart Animal Husbandry: Precise Localization and Image Segmentation of Specific Parts of Sika Deer

by Caocan Zhu, Jinfan Wei, Tonghe Liu, He Gong, Juanjuan Fan and Tianli Hu

Agriculture 2025, 15(16), 1719; https://doi.org/10.3390/agriculture15161719 - 9 Aug 2025

Viewed by 649

Abstract

In precision livestock farming, synchronous and high-precision instance segmentation of multiple key body parts of sika deer serves as the core visual foundation for achieving automated health monitoring, behavior analysis, and automated antler collection. However, in real-world breeding environments, factors such as lighting [...] Read more.

In precision livestock farming, synchronous and high-precision instance segmentation of multiple key body parts of sika deer serves as the core visual foundation for achieving automated health monitoring, behavior analysis, and automated antler collection. However, in real-world breeding environments, factors such as lighting changes, severe individual occlusion, pose diversity, and small targets pose severe challenges to the accuracy and robustness of existing segmentation models. To address these challenges, this study proposes an improved model, MPDF-DetSeg, based on YOLO11-seg. The model reconstructs its neck network, and designs the multipath diversion feature fusion pyramid network (MPDFPN). The multipath feature fusion and cross-scale interaction mechanism are used to solve the segmentation ambiguity problem of deer body occlusion and complex illumination. The design depth separable extended residual module (DWEResBlock) improves the ability to express details such as texture in specific parts of sika deer. Moreover, we adopt the MPDIoU loss function based on vertex geometry constraints to optimize the positioning accuracy of tilted targets. In this study, a dataset consisting of 1036 sika deer images was constructed, covering five categories, including antlers, heads (front/side views), and legs (front/rear legs), and used for method validation. Compared with the original YOLO11-seg model, the improved model made significant progress in several indicators: the mAP50 and mAP50-95 under the bounding-box metrics increased by 2.1% and 4.9% respectively; the mAP50 and mAP50-95 under the mask metrics increased by 2.4% and 5.3%, respectively. In addition, in the mIoU index of image segmentation, the model reached 70.1%, showing the superiority of this method in the accurate detection and segmentation of specific parts of sika deer, this provides an effective and robust technical solution for realizing the multidimensional intelligent perception and automated applications of sika deer. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

29 pages, 3842 KB

Open AccessArticle

SABE-YOLO: Structure-Aware and Boundary-Enhanced YOLO for Weld Seam Instance Segmentation

by Rui Wen, Wu Xie, Yong Fan and Lanlan Shen

J. Imaging 2025, 11(8), 262; https://doi.org/10.3390/jimaging11080262 - 6 Aug 2025

Cited by 1 | Viewed by 721

Abstract

Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, [...] Read more.

Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, existing approaches still face significant challenges in boundary perception and structural representation. Due to the inherently elongated shapes, complex geometries, and blurred edges of weld seams, current segmentation models often struggle to maintain high accuracy in practical applications. To address this issue, a novel structure-aware and boundary-enhanced YOLO (SABE-YOLO) is proposed for weld seam instance segmentation. First, a Structure-Aware Fusion Module (SAFM) is designed to enhance structural feature representation through strip pooling attention and element-wise multiplicative fusion, targeting the difficulty in extracting elongated and complex features. Second, a C2f-based Boundary-Enhanced Aggregation Module (C2f-BEAM) is constructed to improve edge feature sensitivity by integrating multi-scale boundary detail extraction, feature aggregation, and attention mechanisms. Finally, the inner minimum point distance-based intersection over union (Inner-MPDIoU) is introduced to improve localization accuracy for weld seam regions. Experimental results on the self-built weld seam image dataset show that SABE-YOLO outperforms YOLOv8n-Seg by 3 percentage points in the AP(50–95) metric, reaching 46.3%. Meanwhile, it maintains a low computational cost (18.3 GFLOPs) and a small number of parameters (6.6M), while achieving an inference speed of 127 FPS, demonstrating a favorable trade-off between segmentation accuracy and computational efficiency. The proposed method provides an effective solution for high-precision visual perception of complex weld seam structures and demonstrates strong potential for industrial application. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

Search Results (230)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (230)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI