Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (656)

Search Parameters:
Keywords = mask R-CNN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 3541 KB  
Article
INWELD—An Industrial Dataset for Object Detection and Instance Segmentation of Weld Images in Production Scenarios
by Xu Zhang, Qingchun Zheng, Peihao Zhu and Wenpeng Ma
Appl. Sci. 2025, 15(22), 12033; https://doi.org/10.3390/app152212033 (registering DOI) - 12 Nov 2025
Abstract
Welding is one of the most common machining methods in the industrial field, and weld grinding is a key task in the industrial manufacturing process. Although several weld-image datasets exist, most provide only coarse annotations and have limited scale and diversity. To address [...] Read more.
Welding is one of the most common machining methods in the industrial field, and weld grinding is a key task in the industrial manufacturing process. Although several weld-image datasets exist, most provide only coarse annotations and have limited scale and diversity. To address this gap, we constructed INWELD, a comprehensive multi-category weld dataset captured under real-world production conditions, providing both single-label and multi-label annotations. The dataset covers various types of welds and is evenly divided according to production needs. The proposed multi-category annotation method can predict the weld geometry and welding method without additional calculation and is applied to object detection and instance segmentation tasks. To evaluate the applicability of this dataset, we utilized the mainstream algorithms CenterNet and YOLOv7 for object detection, as well as Mask R-CNN, Deep Snake, and YOLACT for instance segmentation. The experimental results show that in single-category annotation, the AP50 of CenterNet and YOLOv7 is close to 90%, and the AP50 of Mask R-CNN and Deep Snake is greater than 80%. In multi-category annotation, the AP50 of CenterNet and YOLOv7 is greater than 80%, and the AP50 of Deep Snake and YOLACT is nearly 70%. The INWELD dataset constructed in this paper fills the gap in industrial weld surface images, lays the theoretical foundation for the intelligent research of welds, and provides data support and research direction for the development of automatic grinding and polishing of welds. Full article
Show Figures

Figure 1

24 pages, 3200 KB  
Article
Enhancing Boundary Precision and Long-Range Dependency Modeling in Medical Imaging via Unified Attention Framework
by Yi Zhu, Yawen Zhu, Hongtao Ma, Bin Li, Luyao Xiao, Xiaxu Wu and Manzhou Li
Electronics 2025, 14(21), 4335; https://doi.org/10.3390/electronics14214335 - 5 Nov 2025
Viewed by 271
Abstract
This study addresses the common challenges in medical image segmentation and recognition, including boundary ambiguity, scale variation, and the difficulty of modeling long-range dependencies, by proposing a unified framework based on a hierarchical attention mechanism. The framework consists of a local detail attention [...] Read more.
This study addresses the common challenges in medical image segmentation and recognition, including boundary ambiguity, scale variation, and the difficulty of modeling long-range dependencies, by proposing a unified framework based on a hierarchical attention mechanism. The framework consists of a local detail attention module, a global context attention module, and a cross-scale consistency constraint module, which collectively enable adaptive weighting and collaborative optimization across different feature levels, thereby achieving a balance between detail preservation and global modeling. The framework was systematically validated on multiple public datasets, and the results demonstrated that the proposed method achieved Dice, IoU, Precision, Recall, and F1 scores of 0.886, 0.781, 0.898, 0.875, and 0.886, respectively, on the combined dataset, outperforming traditional models such as U-Net, Mask R-CNN, DeepLabV3+, SegNet, and TransUNet. On the BraTS dataset, the proposed method achieved a Dice score of 0.922, Precision of 0.930, and Recall of 0.915, exhibiting superior boundary modeling capability in complex brain MRI images. On the LIDC-IDRI dataset, the Dice score and Recall were improved from 0.751 and 0.732 to 0.822 and 0.807, respectively, effectively reducing the missed detection rate of small nodules compared to traditional convolutional models. On the ISIC dermoscopy dataset, the proposed framework achieved a Dice score of 0.914 and a Precision of 0.922, significantly improving the accuracy of skin lesion recognition. The ablation study further revealed that local detail attention significantly enhanced boundary and texture modeling, global context attention strengthened long-range dependency capture, and cross-scale consistency constraints ensured the stability and coherence of prediction results. From a medical economics perspective, the proposed framework has the potential to reduce diagnostic costs and improve healthcare efficiency by enabling faster and more accurate image-based clinical decision-making. In summary, the hierarchical attention mechanism presented in this work not only provides an innovative breakthrough in mathematical modeling but also demonstrates outstanding performance and generalization ability in experiments, offering new perspectives and technical pathways for intelligent segmentation and recognition in medical imaging. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images, 2nd Edition)
Show Figures

Figure 1

16 pages, 10443 KB  
Article
A Machine Learning-Based Model for Classifying the Shape of Tomato
by Trang-Thi Ho, Rosdyana Mangir Irawan Kusuma, Van Lam Ho and Hsiang Yin Wen
AgriEngineering 2025, 7(11), 373; https://doi.org/10.3390/agriengineering7110373 - 5 Nov 2025
Viewed by 215
Abstract
Most fruit classification studies rely on color-based features, but shape-based analysis provides a promising alternative for distinguishing subtle variations within the same variety. Tomato shape classification is challenging due to irregular contours, variable imaging conditions, and difficulty in extracting consistent geometric features. In [...] Read more.
Most fruit classification studies rely on color-based features, but shape-based analysis provides a promising alternative for distinguishing subtle variations within the same variety. Tomato shape classification is challenging due to irregular contours, variable imaging conditions, and difficulty in extracting consistent geometric features. In this study, we propose an efficient and structured workflow to address these challenges through contour-based analysis. The process begins with the application of a Mask Region-based Convolutional Neural Network (Mask R-CNN) model to accurately isolate tomatoes from the background. Subsequently, the segmented tomatoes are extracted and encoded using Elliptic Fourier Descriptors (EFDs) to capture detailed shape characteristics. These features are used to train a range of machine learning models, including Support Vector Machine (SVM), Random Forest, One-Dimensional Convolutional Neural Network (1D-CNN), and Bidirectional Encoder Representations from Transformers (BERT). Experimental results observe that the Random Forest model achieved the highest accuracy of 79.4%. This approach offers a robust, interpretable, and quantitative framework for tomato shape classification, reducing manual labor and supporting practical agricultural applications. Full article
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)
Show Figures

Figure 1

18 pages, 2465 KB  
Article
Comparison of Mask-R-CNN and Thresholding-Based Segmentation for High-Throughput Phenotyping of Walnut Kernel Color
by Steven H. Lee, Sean McDowell, Charles Leslie, Kristina McCreery, Mason Earles and Patrick J. Brown
Plants 2025, 14(21), 3335; https://doi.org/10.3390/plants14213335 - 31 Oct 2025
Viewed by 322
Abstract
High-throughput phenotyping has become essential for plant breeding programs, replacing traditional methods that rely on subjective scales influenced by human judgment. Machine learning (ML) computer vision systems have successfully used convolutional neural networks (CNNs) for image segmentation, providing greater flexibility than thresholding methods [...] Read more.
High-throughput phenotyping has become essential for plant breeding programs, replacing traditional methods that rely on subjective scales influenced by human judgment. Machine learning (ML) computer vision systems have successfully used convolutional neural networks (CNNs) for image segmentation, providing greater flexibility than thresholding methods that may require carefully staged images. This study compares two quantitative image analysis methods, rule-based thresholding using the magick package in R and an instance-segmentation pipeline based on the widely used Mask-R-CNN architecture, and then compares the output of each to two different sets of human evaluations. Walnuts were collected over three years from over 3000 individual trees maintained by the UC Davis walnut breeding program. The resulting 90,961 kernels were placed into 100-cell trays and imaged using a 20-megapixel Basler camera with a Sony IMX183 sensor. Quantitative data from both image analysis methods were highly correlated for both lightness (L*; r2 = 0.997) and size (r2 = 0.984). The thresholding method required many manual adjustments to account for minor discrepancies in staging, while the CNN method was robust after a rapid initial training on only 13 images. The two human scoring methods were not highly correlated with the image analysis methods or with each other. Pixel classification provides data similar to human color assessments but offers greater consistency across different years. The thresholding approach offers flexibility and has been applied to other color-based phenotyping tasks, while the CNN approach can be adapted to images that are not perfectly staged and be retrained to quantify more subtle kernel characteristics such as spotting and shrivel. Full article
Show Figures

Figure 1

27 pages, 4104 KB  
Article
CropCLR-Wheat: A Label-Efficient Contrastive Learning Architecture for Lightweight Wheat Pest Detection
by Yan Wang, Chengze Li, Chenlu Jiang, Mingyu Liu, Shengzhe Xu, Binghua Yang and Min Dong
Insects 2025, 16(11), 1096; https://doi.org/10.3390/insects16111096 - 25 Oct 2025
Viewed by 1052
Abstract
To address prevalent challenges in field-based wheat pest recognition—namely, viewpoint perturbations, sample scarcity, and heterogeneous data distributions—a pest identification framework named CropCLR-Wheat is proposed, which integrates self-supervised contrastive learning with an attention-enhanced mechanism. By incorporating a viewpoint-invariant feature encoder and a diffusion-based feature [...] Read more.
To address prevalent challenges in field-based wheat pest recognition—namely, viewpoint perturbations, sample scarcity, and heterogeneous data distributions—a pest identification framework named CropCLR-Wheat is proposed, which integrates self-supervised contrastive learning with an attention-enhanced mechanism. By incorporating a viewpoint-invariant feature encoder and a diffusion-based feature filtering module, the model significantly enhances pest damage localization and feature consistency, enabling high-accuracy recognition under limited-sample conditions. In 5-shot classification tasks, CropCLR-Wheat achieves a precision of 89.4%, a recall of 87.1%, and an accuracy of 88.2%; these metrics further improve to 92.3%, 90.5%, and 91.2%, respectively, under the 10-shot setting. In the semantic segmentation of wheat pest damage regions, the model attains a mean intersection over union (mIoU) of 82.7%, with precision and recall reaching 85.2% and 82.4%, respectively, markedly outperforming advanced models such as SegFormer and Mask R-CNN. In robustness evaluation under viewpoint disturbances, a prediction consistency rate of 88.7%, a confidence variation of only 7.8%, and a prediction consistency score (PCS) of 0.914 are recorded, indicating strong stability and adaptability. Deployment results further demonstrate the framework’s practical viability: on the Jetson Nano device, an inference latency of 84 ms, a frame rate of 11.9 FPS, and an accuracy of 88.2% are achieved. These results confirm the efficiency of the proposed approach in edge computing environments. By balancing generalization performance with deployability, the proposed method provides robust support for intelligent agricultural terminal systems and holds substantial potential for wide-scale application. Full article
Show Figures

Figure 1

24 pages, 3622 KB  
Article
Simple and Affordable Vision-Based Detection of Seedling Deficiencies to Relieve Labor Shortages in Small-Scale Cruciferous Nurseries
by Po-Jui Su, Tse-Min Chen and Jung-Jeng Su
Agriculture 2025, 15(21), 2227; https://doi.org/10.3390/agriculture15212227 - 25 Oct 2025
Viewed by 223
Abstract
Labor shortages in seedling nurseries, particularly in manual inspection and replanting, hinder operational efficiency despite advancements in automation. This study aims to develop a cost-effective, GPU-free machine vision system to automate the detection of deficient seedlings in plug trays, specifically for small-scale nursery [...] Read more.
Labor shortages in seedling nurseries, particularly in manual inspection and replanting, hinder operational efficiency despite advancements in automation. This study aims to develop a cost-effective, GPU-free machine vision system to automate the detection of deficient seedlings in plug trays, specifically for small-scale nursery operations. The proposed Deficiency Detection and Replanting Positioning (DDRP) machine integrates low-cost components including an Intel RealSense Depth Camera D435, Raspberry Pi 4B, stepper motors, and a programmable logic controller (PLC). It utilizes OpenCV’s Haar cascade algorithm, HSV color space conversion, and Otsu thresholding to enable real-time image processing without GPU acceleration. The proposed Deficiency Detection and Replanting Positioning (DDRP) machine integrates low-cost components including an Intel RealSense Depth Camera D435, Raspberry Pi 4B, stepper motors, and a programmable logic controller (PLC). It utilizes OpenCV’s Haar cascade algorithm, HSV color space conversion, and Otsu thresholding to enable real-time image processing without GPU acceleration. Under controlled laboratory conditions, the DDRP-Machine achieved high detection accuracy (96.0–98.7%) and precision rates (82.14–83.78%). Benchmarking against deep-learning models such as YOLOv5x and Mask R-CNN showed comparable performance, while requiring only one-third to one-fifth of the cost and avoiding complex infrastructure. The Batch Detection (BD) mode significantly reduced processing time compared to Continuous Detection (CD), enhancing real-time applicability. The DDRP-Machine demonstrates strong potential to improve seedling inspection efficiency and reduce labor dependency in nursery operations. Its modular design and minimal hardware requirements make it a practical and scalable solution for resource-limited environments. This study offers a viable pathway for small-scale farms to adopt intelligent automation without the financial burden of high-end AI systems. Future enhancements, adaptive lighting and self-learning capabilities, will further improve field robustness and including broaden its applicability across diverse nursery conditions. Full article
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)
Show Figures

Figure 1

30 pages, 4298 KB  
Article
Integrating Convolutional, Transformer, and Graph Neural Networks for Precision Agriculture and Food Security
by Esraa A. Mahareek, Mehmet Akif Cifci and Abeer S. Desuky
AgriEngineering 2025, 7(10), 353; https://doi.org/10.3390/agriengineering7100353 - 19 Oct 2025
Viewed by 796
Abstract
Ensuring global food security requires accurate and robust solutions for crop health monitoring, weed detection, and large-scale land-cover classification. To this end, we propose AgroVisionNet, a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs) for local feature extraction, Vision Transformers (ViTs) [...] Read more.
Ensuring global food security requires accurate and robust solutions for crop health monitoring, weed detection, and large-scale land-cover classification. To this end, we propose AgroVisionNet, a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs) for local feature extraction, Vision Transformers (ViTs) for capturing long-range global dependencies, and Graph Neural Networks (GNNs) for modeling spatial relationships between image regions. The framework was evaluated on five diverse benchmark datasets—PlantVillage (leaf-level disease detection), Agriculture-Vision (field-scale anomaly segmentation), BigEarthNet (satellite-based land-cover classification), UAV Crop and Weed (weed segmentation), and EuroSAT (multi-class land-cover recognition). Across these datasets, AgroVisionNet consistently outperformed strong baselines including ResNet-50, EfficientNet-B0, ViT, and Mask R-CNN. For example, it achieved 97.8% accuracy and 95.6% IoU on PlantVillage, 94.5% accuracy on Agriculture-Vision, 92.3% accuracy on BigEarthNet, 91.5% accuracy on UAV Crop and Weed, and 96.4% accuracy on EuroSAT. These results demonstrate the framework’s robustness across tasks ranging from fine-grained disease detection to large-scale anomaly mapping. The proposed hybrid approach addresses persistent challenges in agricultural imaging, including class imbalance, image quality variability, and the need for multi-scale feature integration. By combining complementary architectural strengths, AgroVisionNet establishes a new benchmark for deep learning applications in precision agriculture. Full article
Show Figures

Figure 1

25 pages, 9844 KB  
Article
Deep Learning and Geometric Modeling for 3D Reconstruction of Subsurface Utilities from GPR Data
by Peyman Jafary, Davood Shojaei and Krista A. Ehinger
Sensors 2025, 25(20), 6414; https://doi.org/10.3390/s25206414 - 17 Oct 2025
Viewed by 590
Abstract
Accurate underground utility mapping remains a critical yet complex task in Ground Penetrating Radar (GPR) interpretation, essential to avoiding costly and dangerous excavation errors. This study presents a novel deep learning-based pipeline for 3D reconstruction of buried linear utilities from high-resolution GPR B-scan [...] Read more.
Accurate underground utility mapping remains a critical yet complex task in Ground Penetrating Radar (GPR) interpretation, essential to avoiding costly and dangerous excavation errors. This study presents a novel deep learning-based pipeline for 3D reconstruction of buried linear utilities from high-resolution GPR B-scan data. Three state-of-the-art models—YOLOv8, YOLOv11, and Mask R-CNN—were employed for both bounding box and keypoint detection of hyperbolic reflections, using a real-world GPR dataset. On the test set, Mask R-CNN achieved the highest keypoint F1-score (0.822) and bounding box F1-score (0.867), outperforming the YOLO models. Detected summit points were clustered using a 3D DBSCAN algorithm to approximate the spatial trajectories of buried utilities. RANSAC-based line fitting was then applied to each cluster, yielding an average RMSE of 0.06 across all fitted 3D paths. The key innovation of this hybrid model lies in its use of real-world data (avoiding synthetic augmentation), direct summit point detection (beyond bounding box analysis), and a geometric 3D reconstruction pipeline. This approach addresses key limitations in prior studies, including poor generalizability to complex real-world scenarios and the reliance on full 3D data volumes. Our method offers a more practical and scalable solution for subsurface utility mapping in real-world settings. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

20 pages, 5553 KB  
Article
An Improved Instance Segmentation Approach for Solid Waste Retrieval with Precise Edge from UAV Images
by Yaohuan Huang and Zhuo Chen
Remote Sens. 2025, 17(20), 3410; https://doi.org/10.3390/rs17203410 - 11 Oct 2025
Viewed by 468
Abstract
As a major contributor to environmental pollution in recent years, solid waste has become an increasingly significant concern in the realm of sustainable development. Unmanned Aerial Vehicle (UAV) imagery, known for its high spatial resolution, has become a valuable data source for solid [...] Read more.
As a major contributor to environmental pollution in recent years, solid waste has become an increasingly significant concern in the realm of sustainable development. Unmanned Aerial Vehicle (UAV) imagery, known for its high spatial resolution, has become a valuable data source for solid waste detection. However, manually interpreting solid waste in UAV images is inefficient, and object detection methods encounter serious challenges due to the patchy distribution, varied textures and colors, and fragmented edges of solid waste. In this study, we proposed an improved instance segmentation approach called Watershed Mask Network for Solid Waste (WMNet-SW) to accurately retrieve solid waste with precise edges from UAV images. This approach combined the well-established Mask R-CNN segmentation framework with the watershed transform edge detection algorithm. The benchmark Mask R-CNN was improved by optimizing the anchor size and Region of Interest (RoI) and integrating a new mask head of Layer Feature Aggregation (LFA) to initially detect solid waste. Subsequently, edges of the detected solid waste were precisely adjusted by overlaying the segments generated by the watershed transform algorithm. Experimental results show that WMNet-SW significantly enhances the performance of Mask R-CNN in solid waste retrieval, increasing the average precision from 36.91% to 58.10%, F1-score from 0.5 to 0.65, and AP from 63.04% to 64.42%. Furthermore, our method efficiently detects the details of solid waste edges, even overcoming the limitations of training Ground Truth (GT). This study provides a solution for retrieving solid waste with precise edges from UAV images, thereby contributing to the protection of the regional environment and ecosystem health. Full article
(This article belongs to the Section Environmental Remote Sensing)
Show Figures

Figure 1

19 pages, 4825 KB  
Article
Research on Instance Segmentation Algorithm for Caged Chickens in Infrared Images Based on Improved Mask R-CNN
by Youqing Chen, Hang Liu, Lun Wang, Chen Chen, Siyu Li, Binyuan Zhong, Jihui Qiao, Rong Ye and Tong Li
Sensors 2025, 25(19), 6237; https://doi.org/10.3390/s25196237 - 8 Oct 2025
Viewed by 452
Abstract
Infrared images of caged chickens can provide valuable insights into their health status. Accurately detecting and segmenting individual chickens in these images is essential for effective health monitoring in large-scale chicken farming. However, the presence of obstacles such as cages, feeders, and drinkers [...] Read more.
Infrared images of caged chickens can provide valuable insights into their health status. Accurately detecting and segmenting individual chickens in these images is essential for effective health monitoring in large-scale chicken farming. However, the presence of obstacles such as cages, feeders, and drinkers can obscure the chickens, while clustering and overlapping among them may further hinder segmentation accuracy. This study proposes a Mask R-CNN-based instance segmentation algorithm specifically designed for caged chickens in infrared images. The backbone network is enhanced by incorporating the CBAM within this algorithm, which is further combined with the AC-FPN architecture to improve the model’s ability to extract features. Experimental results demonstrate that the model achieves average AP and AR10 values of 78.66% and 85.80%, respectively, in object detection, as per the COCO performance metrics. In segmentation tasks, the model attains average AP and AR10 values of 73.94% and 80.42%, respectively, reflecting improvements of 32.91% and 17.78% over the original model. Notably, among all categories of chicken flocks, the ‘Chicken-many’ category achieved an impressive average segmentation accuracy of 98.51%, and the other categories also surpassed 93%. The proposed instance segmentation method for caged chickens in infrared images effectively facilitates the recognition and segmentation of chickens within the challenging imaging conditions typical of high-density caged environments, thereby contributing to enhanced production efficiency and the advancement of intelligent breeding management. Full article
Show Figures

Figure 1

14 pages, 1787 KB  
Article
HE-DMDeception: Adversarial Attack Network for 3D Object Detection Based on Human Eye and Deep Learning Model Deception
by Pin Zhang, Yawen Liu, Heng Liu, Yichao Teng, Jiazheng Ni, Zhuansun Xiaobo and Jiajia Wang
Information 2025, 16(10), 867; https://doi.org/10.3390/info16100867 - 7 Oct 2025
Viewed by 423
Abstract
This paper presents HE-DMDeception, a novel adversarial attack network that integrates human visual deception with deep model deception to enhance the security of 3D object detection. Existing patch-based and camouflage methods can mislead deep learning models but struggle to generate visually imperceptible, high-quality [...] Read more.
This paper presents HE-DMDeception, a novel adversarial attack network that integrates human visual deception with deep model deception to enhance the security of 3D object detection. Existing patch-based and camouflage methods can mislead deep learning models but struggle to generate visually imperceptible, high-quality textures. Our framework employs a CycleGAN-based camouflage network to generate highly camouflaged background textures, while a dedicated deception module disrupts non-maximum suppression (NMS) and attention mechanisms through optimized constraints that balance attack efficacy and visual fidelity. To overcome the scarcity of annotated vehicle data, an image segmentation module based on the pre-trained Segment Anything (SAM) model is introduced, leveraging a two-stage training strategy combining semi-supervised self-training and supervised fine-tuning. Experimental results show that the minimum P@0.5 values (50%, 55%, 20%, 25%, 25%) were achieved by HE-DMDeception across You Only Look Once version 8 (YOLOv8), Real-Time Detection Transformer (RT-DETR), Fast Region-based Convolutional Neural Network (Faster-RCNN), Single Shot MultiBox Detector (SSD), and MaskRegion-based Convolutional Neural Network (Mask RCNN) detection models, while maintaining high visual consistency with the original camouflage. These findings demonstrate the robustness and practicality of HE-DMDeception, offering new insights into 3D object detection adversarial attacks. Full article
Show Figures

Figure 1

23 pages, 4731 KB  
Article
Advancing Urban Roof Segmentation: Transformative Deep Learning Models from CNNs to Transformers for Scalable and Accurate Urban Imaging Solutions—A Case Study in Ben Guerir City, Morocco
by Hachem Saadaoui, Saad Farah, Hatim Lechgar, Abdellatif Ghennioui and Hassan Rhinane
Technologies 2025, 13(10), 452; https://doi.org/10.3390/technologies13100452 - 6 Oct 2025
Viewed by 1001
Abstract
Urban roof segmentation plays a pivotal role in applications such as urban planning, infrastructure management, and renewable energy deployment. This study explores the evolution of deep learning techniques from traditional Convolutional Neural Networks (CNNs) to cutting-edge transformer-based models in the context of roof [...] Read more.
Urban roof segmentation plays a pivotal role in applications such as urban planning, infrastructure management, and renewable energy deployment. This study explores the evolution of deep learning techniques from traditional Convolutional Neural Networks (CNNs) to cutting-edge transformer-based models in the context of roof segmentation from satellite imagery. We highlight the limitations of conventional methods when applied to urban environments, including resolution constraints and the complexity of roof structures. To address these challenges, we evaluate two advanced deep learning models, Mask R-CNN and MaskFormer, which have shown significant promise in accurately segmenting roofs, even in dense urban settings with diverse roof geometries. These models, especially the one based on transformers, offer improved segmentation accuracy by capturing both global and local image features, enhancing their performance in tasks where fine detail and contextual awareness are critical. A case study on Ben Guerir City in Morocco, an urban area experiencing rapid development, serves as the foundation for testing these models. Using high-resolution satellite imagery, the segmentation results offer a deeper understanding of the accuracy and effectiveness of these models, particularly in optimizing urban planning and renewable energy assessments. Quantitative metrics such as Intersection over Union (IoU), precision, recall, and F1-score are used to benchmark model performance. Mask R-CNN achieved a mean IoU of 74.6%, precision of 81.3%, recall of 78.9%, and F1-score of 80.1%, while MaskFormer reached a mean IoU of 79.8%, precision of 85.6%, recall of 82.7%, and F1-score of 84.1% (pixel-level, micro-averaged at IoU = 0.50 on the held-out test set), highlighting the transformative potential of transformer-based architectures for scalable and precise urban imaging. The study also outlines future work in 3D modeling and height estimation, positioning these advancements as critical tools for sustainable urban development. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Graphical abstract

28 pages, 5791 KB  
Article
Tree Health Assessment Using Mask R-CNN on UAV Multispectral Imagery over Apple Orchards
by Mohadeseh Kaviani, Brigitte Leblon, Thangarajah Akilan, Dzhamal Amishev, Armand LaRocque and Ata Haddadi
Remote Sens. 2025, 17(19), 3369; https://doi.org/10.3390/rs17193369 - 6 Oct 2025
Viewed by 681
Abstract
Accurate tree health monitoring in orchards is essential for optimal orchard production. This study investigates the efficacy of a deep learning-based object detection single-step method for detecting tree health on multispectral UAV imagery. A modified Mask R-CNN framework is employed with four different [...] Read more.
Accurate tree health monitoring in orchards is essential for optimal orchard production. This study investigates the efficacy of a deep learning-based object detection single-step method for detecting tree health on multispectral UAV imagery. A modified Mask R-CNN framework is employed with four different backbones—ResNet-50, ResNet-101, ResNeXt-101, and Swin Transformer—on three image combinations: (1) RGB images, (2) 5-band multispectral images comprising RGB, Red-Edge, and Near-Infrared (NIR) bands, and (3) three principal components (3PCs) computed from the reflectance of the five spectral bands and twelve associated vegetation index images. The Mask R-CNN, having a ResNeXt-101 backbone, and applied to the 5-band multispectral images, consistently outperforms other configurations, with an F1-score of 85.68% and a mean Intersection over Union (mIoU) of 92.85%. To address the class imbalance, class weighting and focal loss were integrated into the model, yielding improvements in the detection of the minority class, i.e., the unhealthy trees. The tested method has the advantage of allowing the detection of unhealthy trees over UAV images using a single-step approach. Full article
Show Figures

Figure 1

18 pages, 4927 KB  
Article
Automated Grading of Boiled Shrimp by Color Level Using Image Processing Techniques and Mask R-CNN with Feature Pyramid Networks
by Manit Chansuparp, Nantipa Pansawat and Sansanee Wangvoralak
Appl. Sci. 2025, 15(19), 10632; https://doi.org/10.3390/app151910632 - 1 Oct 2025
Viewed by 393
Abstract
Color grading of boiled shrimp is a critical factor influencing market price, yet the process is usually conducted visually by buyers such as middlemen and processing plants. This subjective practice raises concerns about accuracy, impartiality, and fairness, often resulting in disputes with farmers. [...] Read more.
Color grading of boiled shrimp is a critical factor influencing market price, yet the process is usually conducted visually by buyers such as middlemen and processing plants. This subjective practice raises concerns about accuracy, impartiality, and fairness, often resulting in disputes with farmers. To address this issue, this study proposes a standardized and automated grading approach based on image processing and artificial intelligence. The method requires only a photograph of boiled shrimp placed alongside a color grading ruler. The grading process involves two stages: segmentation of shrimp and ruler regions in the image, followed by color comparison. For segmentation, deep learning models based on Mask R-CNN with a Feature Pyramid Network backbone were employed. Four model configurations were tested, using ResNet and ResNeXt backbones with and without a Boundary Loss function. Results show that the ResNet + Boundary Loss model achieved the highest segmentation performance, with IoU scores of 91.2% for shrimp and 87.8% for the color ruler. In the grading step, color similarity was evaluated in the CIELAB color space by computing Euclidean distances in the L (lightness) and a (red–green) channels, which align closely with human perception of shrimp coloration. The system achieved grading accuracy comparable to human experts, with a mean absolute error of 1.2, demonstrating its potential to provide consistent, objective, and transparent shrimp quality assessment. Full article
Show Figures

Figure 1

19 pages, 2933 KB  
Article
Image-Based Detection of Chinese Bayberry (Myrica rubra) Maturity Using Cascaded Instance Segmentation and Multi-Feature Regression
by Hao Zheng, Li Sun, Yue Wang, Han Yang and Shuwen Zhang
Horticulturae 2025, 11(10), 1166; https://doi.org/10.3390/horticulturae11101166 - 1 Oct 2025
Viewed by 355
Abstract
The accurate assessment of Chinese bayberry (Myrica rubra) maturity is critical for intelligent harvesting. This study proposes a novel cascaded framework combining instance segmentation and multi-feature regression for accurate maturity detection. First, a lightweight SOLOv2-Light network is employed to segment each [...] Read more.
The accurate assessment of Chinese bayberry (Myrica rubra) maturity is critical for intelligent harvesting. This study proposes a novel cascaded framework combining instance segmentation and multi-feature regression for accurate maturity detection. First, a lightweight SOLOv2-Light network is employed to segment each fruit individually, which significantly reduces computational costs with only a marginal drop in accuracy. Then, a multi-feature extraction network is developed to fuse deep semantic, color (LAB space), and multi-scale texture features, enhanced by a channel attention mechanism for adaptive weighting. The maturity ground truth is defined using the a*/b* ratio measured by a colorimeter, which correlates strongly with anthocyanin accumulation and visual ripeness. Experimental results demonstrated that the proposed method achieves a mask mAP of 0.788 on the instance segmentation task, outperforming Mask R-CNN and YOLACT. For maturity prediction, a mean absolute error of 3.946% is attained, which is a significant improvement over the baseline. When the data are discretized into three maturity categories, the overall accuracy reaches 95.51%, surpassing YOLOX-s and Faster R-CNN by a considerable margin while reducing processing time by approximately 46%. The modular design facilitates easy adaptation to new varieties. This research provides a robust and efficient solution for in-field bayberry maturity detection, offering substantial value for the development of automated harvesting systems. Full article
Show Figures

Figure 1

Back to TopTop