Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (632)

Search Parameters:
Keywords = mask R-CNN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 3910 KB  
Proceeding Paper
Grading Support System for Pear Fruit Using Edge Computing
by Ryo Ito, Shutaro Konuma and Tatsuya Yamazaki
Eng. Proc. 2025, 107(1), 45; https://doi.org/10.3390/engproc2025107045 - 1 Sep 2025
Abstract
Le Lectier pears (hereafter, Pears) are graded based on appearance, requiring farmers to inspect tens of thousands in a short time before shipment. To assist in this process, a grading support system was developed. The existing cloud-based system used mobile devices to capture [...] Read more.
Le Lectier pears (hereafter, Pears) are graded based on appearance, requiring farmers to inspect tens of thousands in a short time before shipment. To assist in this process, a grading support system was developed. The existing cloud-based system used mobile devices to capture images and analyzed them with Convolutional Neural Networks (CNNs) and texture-based algorithms. However, communication delays and algorithm inefficiencies resulted in a 30 s execution time, posing a problem. This paper proposes an edge computing-based system using Mask R-CNN for appearance deterioration detection. Processing on edge servers reduces execution time to 5–10 s, and 39 out of 51 Pears are accurately detected. Full article
Show Figures

Figure 1

20 pages, 5971 KB  
Article
A Novel UAV- and AI-Based Remote Sensing Approach for Quantitative Monitoring of Jellyfish Populations: A Case Study of Acromitus flagellatus in Qinglan Port
by Fang Zhang, Shuo Wang, Yanhao Qiu, Nan Wang, Song Sun and Hongsheng Bi
Remote Sens. 2025, 17(17), 3020; https://doi.org/10.3390/rs17173020 - 31 Aug 2025
Abstract
The frequency of jellyfish blooms in marine ecosystems has been rising globally, attracting significant attention from the scientific community and the general public. Low-altitude remote sensing with Unmanned Aerial Vehicles (UAVs) offers a promising approach for rapid, large-scale, and automated image acquisition, making [...] Read more.
The frequency of jellyfish blooms in marine ecosystems has been rising globally, attracting significant attention from the scientific community and the general public. Low-altitude remote sensing with Unmanned Aerial Vehicles (UAVs) offers a promising approach for rapid, large-scale, and automated image acquisition, making it an effective tool for jellyfish population monitoring. This study employed UAVs for extensive sea surface surveys, achieving quantitative monitoring of the spatial distribution of jellyfish and optimizing flight altitude through gradient experiments. We developed a “bell diameter measurement model” for estimating jellyfish bell diameters from aerial images and used the Mask R-CNN algorithm to identify and count jellyfish automatically. This method was tested in Qinglan Port, where we monitored Acromitus flagellatus populations from mid-April to mid-May 2021 and late May 2023. Our results show that the UAVs can monitor jellyfish with bell diameters of 5 cm or more, and the optimal flight height is 100–150 m. The bell diameter measurement model, defined as L = 0.0103 × H × N + 0.1409, showed no significant deviation from field measurements. Compared to visual identification by human experts, the automated method achieved high accuracy while reducing labor and time costs. Case analysis revealed that the abundance of A. flagellatus in Qinglan Port initially increased and then decreased from mid-April to mid-May 2021, displaying a distinct patchy distribution. During this period, the average bell diameter gradually increased from 15.0 ± 3.4 cm to 15.5 ± 4.3 cm, with observed sizes ranging from 8.2 to 24.5 cm. This study introduces a novel, efficient, and cost-effective UAV-based method for quantitative monitoring of large jellyfish populations in surface waters, with broad applicability. Full article
Show Figures

Figure 1

16 pages, 2127 KB  
Article
VIPS: Learning-View-Invariant Feature for Person Search
by Hexu Wang, Wenlong Luo, Wei Wu, Fei Xie, Jindong Liu, Jing Li and Shizhou Zhang
Sensors 2025, 25(17), 5362; https://doi.org/10.3390/s25175362 (registering DOI) - 29 Aug 2025
Viewed by 88
Abstract
Unmanned aerial vehicles (UAVs) have become indispensable tools for surveillance, enabled by their ability to capture multi-perspective imagery in dynamic environments. Among critical UAV-based tasks, cross-platform person search—detecting and identifying individuals across distributed camera networks—presents unique challenges. Severe viewpoint variations, occlusions, and cluttered [...] Read more.
Unmanned aerial vehicles (UAVs) have become indispensable tools for surveillance, enabled by their ability to capture multi-perspective imagery in dynamic environments. Among critical UAV-based tasks, cross-platform person search—detecting and identifying individuals across distributed camera networks—presents unique challenges. Severe viewpoint variations, occlusions, and cluttered backgrounds in UAV-captured data degrade the performance of conventional discriminative models, which struggle to maintain robustness under such geometric and semantic disparities. To address this, we propose view-invariant person search (VIPS), a novel two-stage framework combining Faster R-CNN with a view-invariant re-Identification (VIReID) module. Unlike conventional discriminative models, VIPS leverages the semantic flexibility of large vision–language models (VLMs) and adopts a two-stage training strategy to decouple and align text-based ID descriptors and visual features, enabling robust cross-view matching through shared semantic embeddings. To mitigate noise from occlusions and cluttered UAV-captured backgrounds, we introduce a learnable mask generator for feature purification. Furthermore, drawing from vision–language models, we design view prompts to explicitly encode perspective shifts into feature representations, enhancing adaptability to UAV-induced viewpoint changes. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, with ablation studies validating the efficacy of each component. Beyond technical advancements, this work highlights the potential of VLM-derived semantic alignment for UAV applications, offering insights for future research in real-time UAV-based surveillance systems. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

28 pages, 4317 KB  
Article
Multi-Scale Attention Networks with Feature Refinement for Medical Item Classification in Intelligent Healthcare Systems
by Waqar Riaz, Asif Ullah and Jiancheng (Charles) Ji
Sensors 2025, 25(17), 5305; https://doi.org/10.3390/s25175305 - 26 Aug 2025
Viewed by 391
Abstract
The increasing adoption of artificial intelligence (AI) in intelligent healthcare systems has elevated the demand for robust medical imaging and vision-based inventory solutions. For an intelligent healthcare inventory system, accurate recognition and classification of medical items, including medicines and emergency supplies, are crucial [...] Read more.
The increasing adoption of artificial intelligence (AI) in intelligent healthcare systems has elevated the demand for robust medical imaging and vision-based inventory solutions. For an intelligent healthcare inventory system, accurate recognition and classification of medical items, including medicines and emergency supplies, are crucial for ensuring inventory integrity and timely access to life-saving resources. This study presents a hybrid deep learning framework, EfficientDet-BiFormer-ResNet, that integrates three specialized components: EfficientDet’s Bidirectional Feature Pyramid Network (BiFPN) for scalable multi-scale object detection, BiFormer’s bi-level routing attention for context-aware spatial refinement, and ResNet-18 enhanced with triplet loss and Online Hard Negative Mining (OHNM) for fine-grained classification. The model was trained and validated on a custom healthcare inventory dataset comprising over 5000 images collected under diverse lighting, occlusion, and arrangement conditions. Quantitative evaluations demonstrated that the proposed system achieved a mean average precision (mAP@0.5:0.95) of 83.2% and a top-1 classification accuracy of 94.7%, outperforming conventional models such as YOLO, SSD, and Mask R-CNN. The framework excelled in recognizing visually similar, occluded, and small-scale medical items. This work advances real-time medical item detection in healthcare by providing an AI-enabled, clinically relevant vision system for medical inventory management. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

21 pages, 7575 KB  
Article
Mapping Orchard Trees from UAV Imagery Through One Growing Season: A Comparison Between OBIA-Based and Three CNN-Based Object Detection Methods
by Maggi Kelly, Shane Feirer, Sean Hogan, Andy Lyons, Fengze Lin and Ewelina Jacygrad
Drones 2025, 9(9), 593; https://doi.org/10.3390/drones9090593 - 22 Aug 2025
Viewed by 367
Abstract
Extracting the shapes of individual tree crowns from high-resolution imagery can play a crucial role in many applications, including precision agriculture. We evaluated three CNN models—MASK R-CNN, YOLOv3, and SAM—and compared their tree crown results with OBIA-based reference datasets from UAV imagery for [...] Read more.
Extracting the shapes of individual tree crowns from high-resolution imagery can play a crucial role in many applications, including precision agriculture. We evaluated three CNN models—MASK R-CNN, YOLOv3, and SAM—and compared their tree crown results with OBIA-based reference datasets from UAV imagery for seven dates across one growing season. We found that YOLOv3 performed poorly across all dates; both MASK R-CNN and SAM performed well in May, June, September, and November (precision, recall, and F1 scores over 0.79). All models struggled in the early season imagery (e.g., March). MASK R-CNN outperformed other models in August (when there was smoke haze) and December (showing end-of-season variation in leaf color). SAM was the fastest model, and, as it required no training, it could cover more area in less time; MASK R-CNN was very accurate and customizable. In this paper, we aimed to contribute insight into which CNN model offers the best balance of accuracy and ease of implementation for orchard management tasks. We also evaluated its applicability within one software ecosystem, ESRI ArcGIS Pro, and showed how such an approach offers users a streamlined efficient way to detect objects in high-resolution UAV imagery. Full article
Show Figures

Figure 1

16 pages, 7955 KB  
Article
Development and Validation of a Computer Vision Dataset for Object Detection and Instance Segmentation in Earthwork Construction Sites
by JongHo Na, JaeKang Lee, HyuSoung Shin and IlDong Yun
Appl. Sci. 2025, 15(16), 9000; https://doi.org/10.3390/app15169000 - 14 Aug 2025
Viewed by 322
Abstract
Construction sites report the highest rate of industrial accidents, prompting the active development of smart safety management systems based on deep learning-based computer vision technology. To support the digital transformation of construction sites, securing site-specific datasets is essential. In this study, raw data [...] Read more.
Construction sites report the highest rate of industrial accidents, prompting the active development of smart safety management systems based on deep learning-based computer vision technology. To support the digital transformation of construction sites, securing site-specific datasets is essential. In this study, raw data were collected from an actual earthwork site. Key construction equipment and terrain objects primarily operated at the site were identified, and 89,766 images were processed to build a site-specific training dataset. This dataset includes annotated bounding boxes for object detection and polygon masks for instance segmentation. The performance of the dataset was validated using representative models—YOLO v7 for object detection and Mask R-CNN for instance segmentation. Quantitative metrics and visual assessments confirmed the validity and practical applicability of the dataset. The dataset used in this study has been made publicly available for use by researchers in related fields. This dataset is expected to serve as a foundational resource for advancing object detection applications in construction safety. Full article
(This article belongs to the Section Civil Engineering)
Show Figures

Figure 1

28 pages, 14601 KB  
Article
Balancing Accuracy and Computational Efficiency: A Faster R-CNN with Foreground-Background Segmentation-Based Spatial Attention Mechanism for Wild Plant Recognition
by Zexuan Cui, Zhibo Chen and Xiaohui Cui
Plants 2025, 14(16), 2533; https://doi.org/10.3390/plants14162533 - 14 Aug 2025
Viewed by 336
Abstract
Computer vision recognition technology, due to its non-invasive and convenient nature, can effectively avoid damage to fragile wild plants during recognition. However, balancing model complexity, recognition accuracy, and data processing difficulty on resource-constrained hardware is a critical issue that needs to be addressed. [...] Read more.
Computer vision recognition technology, due to its non-invasive and convenient nature, can effectively avoid damage to fragile wild plants during recognition. However, balancing model complexity, recognition accuracy, and data processing difficulty on resource-constrained hardware is a critical issue that needs to be addressed. To tackle these challenges, we propose an improved lightweight Faster R-CNN architecture named ULS-FRCN. This architecture includes three key improvements: a Light Bottleneck module based on depthwise separable convolution to reduce model complexity; a Split SAM lightweight spatial attention mechanism to improve recognition accuracy without increasing model complexity; and unsharp masking preprocessing to enhance model performance while reducing data processing difficulty and training costs. We validated the effectiveness of ULS-FRCN using five representative wild plants from the PlantCLEF 2015 dataset. Ablation experiments and multi-dataset generalization tests show that ULS-FRCN significantly outperforms the baseline model in terms of mAP, mean F1 score, and mean recall, with improvements of 12.77%, 0.01, and 9.07%, respectively. Compared to the original Faster R-CNN, our lightweight design and attention mechanism reduce training parameters, improve inference speed, and enhance computational efficiency. This approach is suitable for deployment on resource-constrained forestry devices, enabling efficient plant identification and management without the need for high-performance servers. Full article
Show Figures

Figure 1

25 pages, 9564 KB  
Article
Semantic-Aware Cross-Modal Transfer for UAV-LiDAR Individual Tree Segmentation
by Fuyang Zhou, Haiqing He, Ting Chen, Tao Zhang, Minglu Yang, Ye Yuan and Jiahao Liu
Remote Sens. 2025, 17(16), 2805; https://doi.org/10.3390/rs17162805 - 13 Aug 2025
Viewed by 372
Abstract
Cross-modal semantic segmentation of individual tree LiDAR point clouds is critical for accurately characterizing tree attributes, quantifying ecological interactions, and estimating carbon storage. However, in forest environments, this task faces key challenges such as high annotation costs and poor cross-domain generalization. To address [...] Read more.
Cross-modal semantic segmentation of individual tree LiDAR point clouds is critical for accurately characterizing tree attributes, quantifying ecological interactions, and estimating carbon storage. However, in forest environments, this task faces key challenges such as high annotation costs and poor cross-domain generalization. To address these issues, this study proposes a cross-modal semantic transfer framework tailored for individual tree point cloud segmentation in forested scenes. Leveraging co-registered UAV-acquired RGB imagery and LiDAR data, we construct a technical pipeline of “2D semantic inference—3D spatial mapping—cross-modal fusion” to enable annotation-free semantic parsing of 3D individual trees. Specifically, we first introduce a novel Multi-Source Feature Fusion Network (MSFFNet) to achieve accurate instance-level segmentation of individual trees in the 2D image domain. Subsequently, we develop a hierarchical two-stage registration strategy to effectively align dense matched point clouds (MPC) generated from UAV imagery with LiDAR point clouds. On this basis, we propose a probabilistic cross-modal semantic transfer model that builds a semantic probability field through multi-view projection and the expectation–maximization algorithm. By integrating geometric features and semantic confidence, the model establishes semantic correspondences between 2D pixels and 3D points, thereby achieving spatially consistent semantic label mapping. This facilitates the transfer of semantic annotations from the 2D image domain to the 3D point cloud domain. The proposed method is evaluated on two forest datasets. The results demonstrate that the proposed individual tree instance segmentation approach achieves the highest performance, with an IoU of 87.60%, compared to state-of-the-art methods such as Mask R-CNN, SOLOV2, and Mask2Former. Furthermore, the cross-modal semantic label transfer framework significantly outperforms existing mainstream methods in individual tree point cloud semantic segmentation across complex forest scenarios. Full article
Show Figures

Figure 1

54 pages, 2856 KB  
Review
Applications, Trends, and Challenges of Precision Weed Control Technologies Based on Deep Learning and Machine Vision
by Xiangxin Gao, Jianmin Gao and Waqar Ahmed Qureshi
Agronomy 2025, 15(8), 1954; https://doi.org/10.3390/agronomy15081954 - 13 Aug 2025
Viewed by 795
Abstract
Advanced computer vision (CV) and deep learning (DL) are essential for sustainable agriculture via automated vegetation management. This paper methodically reviews advancements in these technologies for agricultural settings, analyzing their fundamental principles, designs, system integration, and practical applications. The amalgamation of transformer topologies [...] Read more.
Advanced computer vision (CV) and deep learning (DL) are essential for sustainable agriculture via automated vegetation management. This paper methodically reviews advancements in these technologies for agricultural settings, analyzing their fundamental principles, designs, system integration, and practical applications. The amalgamation of transformer topologies with convolutional neural networks (CNNs) in models such as YOLO (You Only Look Once) and Mask R-CNN (Region-Based Convolutional Neural Network) markedly enhances target recognition and semantic segmentation. The integration of LiDAR (Light Detection and Ranging) with multispectral imagery significantly improves recognition accuracy in intricate situations. Moreover, the integration of deep learning models with control systems, which include laser modules, robotic arms, and precision spray nozzles, facilitates the development of intelligent robotic mowing systems that significantly diminish chemical herbicide consumption and enhance operational efficiency relative to conventional approaches. Significant obstacles persist, including restricted environmental adaptability, real-time processing limitations, and inadequate model generalization. Future directions entail the integration of varied data sources, the development of streamlined models, and the enhancement of intelligent decision-making systems, establishing a framework for the advancement of sustainable agricultural technology. Full article
(This article belongs to the Special Issue Research Progress in Agricultural Robots in Arable Farming)
Show Figures

Figure 1

22 pages, 825 KB  
Article
Conformal Segmentation in Industrial Surface Defect Detection with Statistical Guarantees
by Cheng Shen and Yuewei Liu
Mathematics 2025, 13(15), 2430; https://doi.org/10.3390/math13152430 - 28 Jul 2025
Viewed by 469
Abstract
Detection of surface defects can significantly elongate mechanical service time and mitigate potential risks during safety management. Traditional defect detection methods predominantly rely on manual inspection, which suffers from low efficiency and high costs. Some machine learning algorithms and artificial intelligence models for [...] Read more.
Detection of surface defects can significantly elongate mechanical service time and mitigate potential risks during safety management. Traditional defect detection methods predominantly rely on manual inspection, which suffers from low efficiency and high costs. Some machine learning algorithms and artificial intelligence models for defect detection, such as Convolutional Neural Networks (CNNs), present outstanding performance, but they are often data-dependent and cannot provide guarantees for new test samples. To this end, we construct a detection model by combining Mask R-CNN, selected for its strong baseline performance in pixel-level segmentation, with Conformal Risk Control. The former evaluates the distribution that discriminates defects from all samples based on probability. The detection model is improved by retraining with calibration data that is assumed to be independent and identically distributed (i.i.d) with the test data. The latter constructs a prediction set on which a given guarantee for detection will be obtained. First, we define a loss function for each calibration sample to quantify detection error rates. Subsequently, we derive a statistically rigorous threshold by optimization of error rates and a given guarantee significance as the risk level. With the threshold, defective pixels with high probability in test images are extracted to construct prediction sets. This methodology ensures that the expected error rate on the test set remains strictly bounded by the predefined risk level. Furthermore, our model shows robust and efficient control over the expected test set error rate when calibration-to-test partitioning ratios vary. Full article
Show Figures

Figure 1

20 pages, 4920 KB  
Article
Martian Skylight Identification Based on the Deep Learning Model
by Lihong Li, Lingli Mu, Wei Zhang, Weihua Dong and Yuqing He
Remote Sens. 2025, 17(15), 2571; https://doi.org/10.3390/rs17152571 - 24 Jul 2025
Viewed by 385
Abstract
As a type of distinctive pit on Mars, skylights are entrances to subsurface lava caves. They are very important for studying volcanic activity and potential preserved water ice, and are also considered as potential sites for human extraterrestrial bases in the future. Most [...] Read more.
As a type of distinctive pit on Mars, skylights are entrances to subsurface lava caves. They are very important for studying volcanic activity and potential preserved water ice, and are also considered as potential sites for human extraterrestrial bases in the future. Most skylights are manually identified, which has low efficiency and is highly subjective. Although deep learning methods have recently been used to identify skylights, they face challenges of few effective samples and low identification accuracy. In this article, 151 positive samples and 920 negative samples based on the MRO-HiRISE image data was used to create an initial skylight dataset, which contained few positive samples. To augment the initial dataset, StyleGAN2-ADA was selected to synthesize some positive samples and generated an augmented dataset with 896 samples. On the basis of the augmented skylight dataset, we proposed YOLOv9-Skylight for skylight identification by incorporating Inner-EIoU loss and DySample to enhance localization accuracy and feature extracting ability. Compared with YOLOv9, the P, R, and the F1 of YOLOv9-Skylight were improved by about 9.1%, 2.8%, and 5.6%, respectively. Compared with other mainstream models such as YOLOv5, YOLOv10, Faster R-CNN, Mask R-CNN, and DETR, YOLOv9-Skylight achieved the highest accuracy (F1 = 92.5%), which shows a strong performance in skylight identification. Full article
(This article belongs to the Special Issue Remote Sensing and Photogrammetry Applied to Deep Space Exploration)
Show Figures

Figure 1

16 pages, 2721 KB  
Article
An Adapter and Segmentation Network-Based Approach for Automated Atmospheric Front Detection
by Xinya Ding, Xuan Peng, Yanguang Xue, Liang Zhang, Tianying Wang and Yunpeng Zhang
Appl. Sci. 2025, 15(14), 7855; https://doi.org/10.3390/app15147855 - 14 Jul 2025
Viewed by 218
Abstract
This study presents AD-MRCNN, an advanced deep learning framework for automated atmospheric front detection that addresses two critical limitations in existing methods. First, current approaches directly input raw meteorological data without optimizing feature compatibility, potentially hindering model performance. Second, they typically only provide [...] Read more.
This study presents AD-MRCNN, an advanced deep learning framework for automated atmospheric front detection that addresses two critical limitations in existing methods. First, current approaches directly input raw meteorological data without optimizing feature compatibility, potentially hindering model performance. Second, they typically only provide frontal category information without identifying individual frontal systems. Our solution integrates two key innovations: 1. An intelligent adapter module that performs adaptive feature fusion, automatically weighting and combining multi-source meteorological inputs (including temperature, wind fields, and humidity data) to maximize their synergistic effects while minimizing feature conflicts; the utilized network achieves an average improvement of over 4% across various metrics. 2. An enhanced instance segmentation network based on Mask R-CNN architecture that simultaneously achieves (1) precise frontal type classification (cold/warm/stationary/occluded), (2) accurate spatial localization, and (3) identification of distinct frontal systems. Comprehensive evaluation using ERA5 reanalysis data (2009–2018) demonstrates significant improvements, including an 85.1% F1-score, outperforming traditional methods (TFP: 63.1%) and deep learning approaches (Unet: 83.3%), and a 31% reduction in false alarms compared to semantic segmentation methods. The framework’s modular design allows for potential application to other meteorological feature detection tasks. Future work will focus on incorporating temporal dynamics for frontal evolution prediction. Full article
Show Figures

Figure 1

17 pages, 3331 KB  
Article
Automated Cattle Head and Ear Pose Estimation Using Deep Learning for Animal Welfare Research
by Sueun Kim
Vet. Sci. 2025, 12(7), 664; https://doi.org/10.3390/vetsci12070664 - 13 Jul 2025
Viewed by 634
Abstract
With the increasing importance of animal welfare, behavioral indicators such as changes in head and ear posture are widely recognized as non-invasive and field-applicable markers for evaluating the emotional state and stress levels of animals. However, traditional visual observation methods are often subjective, [...] Read more.
With the increasing importance of animal welfare, behavioral indicators such as changes in head and ear posture are widely recognized as non-invasive and field-applicable markers for evaluating the emotional state and stress levels of animals. However, traditional visual observation methods are often subjective, as assessments can vary between observers, and are unsuitable for long-term, quantitative monitoring. This study proposes an artificial intelligence (AI)-based system for the detection and pose estimation of cattle heads and ears using deep learning techniques. The system integrates Mask R-CNN for accurate object detection and FSA-Net for robust 3D pose estimation (yaw, pitch, and roll) of cattle heads and left ears. Comprehensive datasets were constructed from images of Japanese Black cattle, collected under natural conditions and annotated for both detection and pose estimation tasks. The proposed framework achieved mean average precision (mAP) values of 0.79 for head detection and 0.71 for left ear detection and mean absolute error (MAE) of approximately 8–9° for pose estimation, demonstrating reliable performance across diverse orientations. This approach enables long-term, quantitative, and objective monitoring of cattle behavior, offering significant advantages over traditional subjective stress assessment methods. The developed system holds promise for practical applications in animal welfare research and real-time farm management. Full article
Show Figures

Figure 1

21 pages, 2471 KB  
Article
Attention-Based Mask R-CNN Enhancement for Infrared Image Target Segmentation
by Liang Wang and Kan Ren
Symmetry 2025, 17(7), 1099; https://doi.org/10.3390/sym17071099 - 9 Jul 2025
Viewed by 600
Abstract
Image segmentation is an important method in the field of image processing, while infrared (IR) image segmentation is one of the challenges in this field due to the unique characteristics of IR data. Infrared imaging utilizes the infrared radiation emitted by objects to [...] Read more.
Image segmentation is an important method in the field of image processing, while infrared (IR) image segmentation is one of the challenges in this field due to the unique characteristics of IR data. Infrared imaging utilizes the infrared radiation emitted by objects to produce images, which can supplement the performance of visible-light images under adverse lighting conditions to some extent. However, the low spatial resolution and limited texture details in IR images hinder the achievement of high-precision segmentation. To address these issues, an attention mechanism based on symmetrical cross-channel interaction—motivated by symmetry principles in computer vision—was integrated into a Mask Region-Based Convolutional Neural Network (Mask R-CNN) framework. A Bottleneck-enhanced Squeeze-and-Attention (BNSA) module was incorporated into the backbone network, and novel loss functions were designed for both the bounding box (Bbox) regression and mask prediction branches to enhance segmentation performance. Furthermore, a dedicated infrared image dataset was constructed to validate the proposed method. The experimental results demonstrate that the optimized model achieves higher segmentation accuracy and better segmentation performance compared to the original network and other mainstream segmentation models on our dataset, demonstrating how symmetrical design principles can effectively improve complex vision tasks. Full article
(This article belongs to the Special Issue Symmetry and Its Applications in Computer Vision)
Show Figures

Figure 1

21 pages, 33500 KB  
Article
Location Research and Picking Experiment of an Apple-Picking Robot Based on Improved Mask R-CNN and Binocular Vision
by Tianzhong Fang, Wei Chen and Lu Han
Horticulturae 2025, 11(7), 801; https://doi.org/10.3390/horticulturae11070801 - 6 Jul 2025
Viewed by 552
Abstract
With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and [...] Read more.
With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and positioning accuracy in complex orchard environments (e.g., uneven illumination, foliage occlusion, and fruit overlap), which hinders practical applications. This study proposes a visual system for apple-harvesting robots based on improved Mask R-CNN and binocular vision to achieve more precise fruit positioning. The binocular camera (ZED2i) carried by the robot acquires dual-channel apple images. An improved Mask R-CNN is employed to implement instance segmentation of apple targets in binocular images, followed by a template-matching algorithm with parallel epipolar constraints for stereo matching. Four pairs of feature points from corresponding apples in binocular images are selected to calculate disparity and depth. Experimental results demonstrate average coefficients of variation and positioning accuracy of 5.09% and 99.61%, respectively, in binocular positioning. During harvesting operations with a self-designed apple-picking robot, the single-image processing time was 0.36 s, the average single harvesting cycle duration reached 7.7 s, and the comprehensive harvesting success rate achieved 94.3%. This work presents a novel high-precision visual positioning method for apple-harvesting robots. Full article
(This article belongs to the Section Fruit Production Systems)
Show Figures

Figure 1

Back to TopTop