Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,911)

Search Parameters:
Keywords = CNN segmentation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 2143 KB  
Article
Application of StarDist to Diagnostic-Grade White Blood Cells Segmentation in Whole Slide Images
by Julius Bamwenda, Mehmet Siraç Özerdem, Orhan Ayyildiz and Veysi Akpolat
Electronics 2025, 14(17), 3538; https://doi.org/10.3390/electronics14173538 - 4 Sep 2025
Abstract
Accurate and automated segmentation of white blood cells (WBCs) in whole slide images (WSIs) is a critical step in computational pathology. This study presents a comprehensive evaluation and enhancement of the StarDist algorithm, leveraging its star-convex polygonal modeling to improve segmentation precision in [...] Read more.
Accurate and automated segmentation of white blood cells (WBCs) in whole slide images (WSIs) is a critical step in computational pathology. This study presents a comprehensive evaluation and enhancement of the StarDist algorithm, leveraging its star-convex polygonal modeling to improve segmentation precision in complex WSI datasets. Our pipeline integrates tailored preprocessing, expert annotations from QuPath, and adaptive learning strategies for model training. Comparative analysis with U-Net and Mask R-CNN demonstrates StarDist’s superiority across multiple performance metrics, including Dice coefficient (0.89), precision (0.99), and IoU (0.95). Visual evaluations further highlight its robustness in handling overlapping cells and staining inconsistencies. The study establishes StarDist as a reliable tool for digital pathology, with potential integration into clinical decision-support systems. In addition to Dice and IoU, metrics such as Aggregated Jaccard Index and Boundary F1-Score are gaining popularity for biomedical segmentation. Preprocessing techniques like Macenko stain normalization and adaptive histogram equalization can further improve generalizability. QuPath, an open-source digital pathology platform, was utilized to perform accurate WBC annotations prior to training and evaluation. Full article
Show Figures

Figure 1

25 pages, 6014 KB  
Article
Enhancing Instance Segmentation in Agriculture: An Optimized YOLOv8 Solution
by Qiaolong Wang, Dongshun Chen, Wenfei Feng, Liang Sun and Gaohong Yu
Sensors 2025, 25(17), 5506; https://doi.org/10.3390/s25175506 - 4 Sep 2025
Abstract
To address the limitations of traditional segmentation algorithms in processing complex agricultural scenes, this paper proposes an improved YOLOv8n-seg model. Building upon the original three detection layers, we introduce a dedicated layer for small object detection, which significantly enhances the detection accuracy of [...] Read more.
To address the limitations of traditional segmentation algorithms in processing complex agricultural scenes, this paper proposes an improved YOLOv8n-seg model. Building upon the original three detection layers, we introduce a dedicated layer for small object detection, which significantly enhances the detection accuracy of small targets (e.g., people) after processing images through fourfold downsampling. In the neck network, we replace the C2f module with our proposed C2f_CPCA module, which incorporates a channel prior attention mechanism (CPCA). This mechanism dynamically adjusts attention weights across channels and spatial dimensions to effectively capture relationships between different spatial scales, thereby improving feature extraction and recognition capabilities while maintaining low computational complexity. Finally, we propose a C3RFEM module based on the RFEM architecture and integrate it into the main network. This module combines dilated convolutions and weighted layers to enhance feature extraction capabilities across different receptive field ranges. Experimental results demonstrated that the improved model achieved 1.4% and 4.0% increases in precision and recall rates on private datasets, respectively, with mAP@0.5 and mAP@0.5:0.95 metrics improved by 3.0% and 3.5%, respectively. In comparative evaluations with instance segmentation algorithms such as the YOLOv5 series, YOLOv7, YOLOv8n, YOLOv9t, YOLOv10n, YOLOv10s, Mask R-CNN, and Mask2Former, our model achieved an optimal balance between computational efficiency and detection performance. This demonstrates its potential for the research and development of small intelligent precision operation technology and equipment. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

18 pages, 3460 KB  
Article
Explainable Multi-Frequency Long-Term Spectrum Prediction Based on GC-CNN-LSTM
by Wei Xu, Jianzhao Zhang, Zhe Su and Luliang Jia
Electronics 2025, 14(17), 3530; https://doi.org/10.3390/electronics14173530 - 4 Sep 2025
Abstract
The rapid development of wireless communication technology is leading to increasingly scarce spectrum resources, making efficient utilization a critical challenge. This paper proposes a Convolutional Neural Network–Long Short-Term Memory-Integrated Gradient-Weighted Class Activation Mapping (GC-CNN-LSTM) model, aimed at enhancing the accuracy of long-term spectrum [...] Read more.
The rapid development of wireless communication technology is leading to increasingly scarce spectrum resources, making efficient utilization a critical challenge. This paper proposes a Convolutional Neural Network–Long Short-Term Memory-Integrated Gradient-Weighted Class Activation Mapping (GC-CNN-LSTM) model, aimed at enhancing the accuracy of long-term spectrum prediction across multiple frequency bands and improving model interpretability. First, we achieve multi-frequency long-term spectrum prediction using a CNN-LSTM and compare its performance against models including LSTM, GRU, CNN, Transformer, and CNN-LSTM-Attention. Next, we use an improved Grad-CAM method to explain the model and obtain global heatmaps in the time–frequency domain. Finally, based on these interpretable results, we optimize the input data by selecting high-importance frequency points and removing low-importance time segments, thereby enhancing prediction accuracy. The simulation results show that the Grad-CAM-based approach achieves good interpretability, reducing RMSE and MAPE by 6.22% and 4.25%, respectively, compared to CNN-LSTM, while a similar optimization using SHapley Additive exPlanations (SHAP) achieves reductions of 0.86% and 3.55%. Full article
(This article belongs to the Special Issue How Graph Convolutional Networks Work: Mechanisms and Models)
Show Figures

Figure 1

21 pages, 760 KB  
Article
Forecasting Financial Volatility Under Structural Breaks: A Comparative Study of GARCH Models and Deep Learning Techniques
by Víctor Chung, Jenny Espinoza and Renán Quispe
J. Risk Financial Manag. 2025, 18(9), 494; https://doi.org/10.3390/jrfm18090494 - 4 Sep 2025
Abstract
The main objective of this study is to evaluate the predictive performance of traditional econometric models and deep learning techniques in forecasting financial volatility under structural breaks. Using daily data from four Latin American stock market indices between 2000 and 2024, we compare [...] Read more.
The main objective of this study is to evaluate the predictive performance of traditional econometric models and deep learning techniques in forecasting financial volatility under structural breaks. Using daily data from four Latin American stock market indices between 2000 and 2024, we compare GARCH models with neural networks such as LSTM and CNN. Structural breaks are identified through a modified ICSS algorithm and incorporated into the GARCH framework via regime segmentation. The results show that neglecting breaks overstates volatility persistence and weakens predictive accuracy, while accounting for them improves GARCH forecasts only in specific cases. By contrast, deep learning models consistently outperform GARCH alternatives at medium- and long-term horizons, capturing nonlinear and time-varying dynamics more effectively. This study contributes to the literature by bridging econometric and deep learning approaches and offers practical insights for policymakers and investors in emerging markets facing recurrent structural instability. Full article
(This article belongs to the Section Financial Technology and Innovation)
Show Figures

Figure 1

27 pages, 10494 KB  
Article
Data-Model Complexity Trade-Off in UAV-Acquired Ultra-High-Resolution Remote Sensing: Empirical Study on Photovoltaic Panel Segmentation
by Zhigang Zou, Xinhui Zhou, Pukaiyuan Yang, Jingyi Liu and Wu Yang
Drones 2025, 9(9), 619; https://doi.org/10.3390/drones9090619 - 3 Sep 2025
Abstract
With the growing adoption of deep learning in remote sensing, the increasing diversity of models and datasets has made method selection and experimentation more challenging, especially for non-expert users. This study presents a comprehensive evaluation of photovoltaic panel segmentation using a large-scale ultra-high-resolution [...] Read more.
With the growing adoption of deep learning in remote sensing, the increasing diversity of models and datasets has made method selection and experimentation more challenging, especially for non-expert users. This study presents a comprehensive evaluation of photovoltaic panel segmentation using a large-scale ultra-high-resolution benchmark of over 25,000 manually annotated unmanned aerial vehicle image patches, systematically quantifying the impact of model and data characteristics. Our results indicate that increasing the spatial diversity of training data has a more substantial impact on training stability and segmentation accuracy than simply adding spectral bands or enlarging the dataset volume. Across all experimental settings, moderate-sized models (DeepLabV3_50, ResUNet50, and SegFormer B4) often provided the best trade-off between segmentation performance and computational efficiency, achieving an average Intersection over Union (IoU) of 0.8966 comparable to 0.8970 of larger models. Moreover, model architecture plays a more critical role than model size; as the ResUNet models consistently achieved higher mean IoU than both DeepLabV3 and SegFormer models, with average improvements of 0.047 and 0.143, respectively. Our findings offer quantitative guidance for balancing architectural choices, model complexity, and dataset design, ultimately promoting more robust and efficient deployment of deep learning models in high-resolution remote sensing applications. Full article
Show Figures

Figure 1

23 pages, 6857 KB  
Article
Multi-Class Classification of Breast Ultrasound Images Using Vision Transformer-Based Ensemble Learning
by Tuğçe Taşar Yıldırım, Orhan Yaman, İrfan Kılıç, Beyda Taşar, Esra Suay Timurkaan and Nesibe Aydoğdu
Diagnostics 2025, 15(17), 2235; https://doi.org/10.3390/diagnostics15172235 - 3 Sep 2025
Abstract
Background/Objectives: In this study, a vision transformer (ViT) based ensemble architecture was developed for the classification of normal, benign, and malignant diseases from breast ultrasound images. The breast ultrasound images (BUSI) dataset was used for the implementation of the proposed method. This [...] Read more.
Background/Objectives: In this study, a vision transformer (ViT) based ensemble architecture was developed for the classification of normal, benign, and malignant diseases from breast ultrasound images. The breast ultrasound images (BUSI) dataset was used for the implementation of the proposed method. This dataset includes 133 normal, 437 benign, and 210 malignant ultrasound images. Methods: ROI segmentation and image preprocessing were applied to the dataset to select only the tumor region and use it in the model. Thus, a better performance was achieved using only the lesion regions. Image augmentation was performed using the Albumentations library to increase the number of images. Feature extraction was performed on the obtained images using three ViT-based models (ViT-Base, DeiT, ViT-Small). The purpose of using three different models is to achieve high accuracy. The extracted features were classified using a multilayer perceptron (MLP). Training was performed using 10-fold stratified cross-validation. Results: The purpose of stratified cross-validation is to include a certain number of images from all three classes in each cross-validation proposed model provided 96.2% precision and 86.3% recall for the benign class and 92.9% recall and 76.4% precision for the malignant class. The normal class achieved 100% success. The area under the curve (AUC) values were 0.97, 0.96, and 1.00 for benign and malignant tumors, respectively, and 1.00 for normal tumors. Conclusions: The ROI-based ViT + MLP + Ensemble architecture provided higher accuracy and explainability compared to traditional convolutional neural network (CNN) based methods in medical image classification. It demonstrated a stable success, especially in minority classes, and presented a potential, reliable, and flexible solution in clinical decision support systems. Full article
Show Figures

Figure 1

19 pages, 3770 KB  
Article
Segmentation of 220 kV Cable Insulation Layers Using WGAN-GP-Based Data Augmentation and the TransUNet Model
by Liang Luo, Song Qing, Yingjie Liu, Guoyuan Lu, Ziying Zhang, Yuhang Xia, Yi Ao, Fanbo Wei and Xingang Chen
Energies 2025, 18(17), 4667; https://doi.org/10.3390/en18174667 - 2 Sep 2025
Abstract
This study presents a segmentation framework for images of 220 kV cable insulation that addresses sample scarcity and blurred boundaries. The framework integrates data augmentation using the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and the TransUNet architecture. Considering the difficulty and [...] Read more.
This study presents a segmentation framework for images of 220 kV cable insulation that addresses sample scarcity and blurred boundaries. The framework integrates data augmentation using the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and the TransUNet architecture. Considering the difficulty and high cost of obtaining real cable images, WGAN-GP generates high-quality synthetic data to expand the dataset and improve the model’s generalization. The TransUNet network, designed to handle the structural complexity and indistinct edge features of insulation layers, combines the local feature extraction capability of convolutional neural networks (CNNs) with the global context modeling strength of Transformers. This combination enables accurate delineation of the insulation regions. The experimental results show that the proposed method achieves mDice, mIoU, MP, and mRecall scores of 0.9835, 0.9677, 0.9840, and 0.9831, respectively, with improvements of approximately 2.03%, 3.05%, 2.08%, and 1.98% over a UNet baseline. Overall, the proposed approach outperforms UNet, Swin-UNet, and Attention-UNet, confirming its effectiveness in delineating 220 kV cable insulation layers under complex structural and data-limited conditions. Full article
(This article belongs to the Special Issue Fault Detection and Diagnosis of Power Distribution System)
Show Figures

Figure 1

19 pages, 805 KB  
Article
A Multi-Level Feature Fusion Network Integrating BERT and TextCNN
by Yangwu Zhang, Mingxiao Xu and Guohe Li
Electronics 2025, 14(17), 3508; https://doi.org/10.3390/electronics14173508 - 2 Sep 2025
Viewed by 29
Abstract
With the rapid growth of job-related crimes in developing economies, there is an urgent need for intelligent judicial systems to standardize sentencing practices. This study proposes a Multi-Level Feature Fusion Network (MLFFN) to enhance the accuracy and interpretability of sentencing predictions in job-related [...] Read more.
With the rapid growth of job-related crimes in developing economies, there is an urgent need for intelligent judicial systems to standardize sentencing practices. This study proposes a Multi-Level Feature Fusion Network (MLFFN) to enhance the accuracy and interpretability of sentencing predictions in job-related crime cases. The model integrates hierarchical legal feature representation, beginning with benchmark judgments (including starting-point penalties and additional penalties) as the foundational input. The frontend of MLFFN employs an attention mechanism to dynamically fuse word-level, segment-level, and position-level embeddings, generating a global feature encoding that captures critical legal relationships. The backend utilizes sliding-window convolutional kernels to extract localized features from the global feature map, preserving nuanced contextual factors that influence sentencing ranges. Trained on a dataset of job-related crime cases, MLFFN achieves a 6%+ performance improvement over the baseline models (BERT-base-Chinese, TextCNN, and ERNIE) in sentencing prediction accuracy. Our findings demonstrate that explicit modeling of legal hierarchies and contextual constraints significantly improves judicial AI systems. Full article
Show Figures

Figure 1

16 pages, 2129 KB  
Article
A Multimodal Convolutional Neural Network Framework for Intelligent Real-Time Monitoring of Etchant Levels in PCB Etching Processes
by Chuen-Sheng Cheng, Pei-Wen Chen, Hen-Yi Jen and Yu-Tang Wu
Mathematics 2025, 13(17), 2804; https://doi.org/10.3390/math13172804 - 1 Sep 2025
Viewed by 150
Abstract
In recent years, machine learning (ML) techniques have gained significant attention in time series classification tasks, particularly in industrial applications where early detection of abnormal conditions is crucial. This study proposes an intelligent monitoring framework based on a multimodal convolutional neural network (CNN) [...] Read more.
In recent years, machine learning (ML) techniques have gained significant attention in time series classification tasks, particularly in industrial applications where early detection of abnormal conditions is crucial. This study proposes an intelligent monitoring framework based on a multimodal convolutional neural network (CNN) to classify normal and abnormal copper ion (Cu2+) concentration states in the etching process in the printed circuit board (PCB) industry. Maintaining precise control Cu2+ concentration is critical in ensuring the quality and reliability of the etching processes. A sliding window approach is employed to segment the data into fixed-length intervals, enabling localized temporal feature extraction. The model fuses two input modalities—raw one-dimensional (1D) time series data and two-dimensional (2D) recurrence plots—allowing it to capture both temporal dynamics and spatial recurrence patterns. Comparative experiments with traditional machine learning classifiers and single-modality CNNs demonstrate that the proposed multimodal CNN significantly outperforms baseline models in terms of accuracy, precision, recall, F1-score, and G-measure. The results highlight the potential of multimodal deep learning in enhancing process monitoring and early fault detection in chemical-based manufacturing. This work contributes to the development of intelligent, adaptive quality control systems in the PCB industry. Full article
(This article belongs to the Special Issue Mathematics Methods of Robotics and Intelligent Systems)
Show Figures

Figure 1

9 pages, 1286 KB  
Proceeding Paper
Grid and Refinement Double-Stage-Based Tumor Detection Using Ultrasonic Images
by Daisuke Osako and Jian-Jiun Ding
Eng. Proc. 2025, 108(1), 6; https://doi.org/10.3390/engproc2025108006 - 29 Aug 2025
Viewed by 114
Abstract
Accurate tumor segmentation is crucial for cancer diagnosis and treatment planning. We developed a hybrid framework combining complementary convolutional neural network (CNN) models and advanced post-processing techniques for robust segmentation. Model 1 uses contrast-limited adaptive histogram equalization preprocessing, CNN predictions, and active contour [...] Read more.
Accurate tumor segmentation is crucial for cancer diagnosis and treatment planning. We developed a hybrid framework combining complementary convolutional neural network (CNN) models and advanced post-processing techniques for robust segmentation. Model 1 uses contrast-limited adaptive histogram equalization preprocessing, CNN predictions, and active contour refinement, but struggles with complex tumor boundaries. Model 2 applies noise-augmented preprocessing and iterative detection to enhance the segmentation of subtle and irregular regions. The outputs of both models are merged and refined with edge correction, size filtering, and a spatial intensity metric (SIM) expansion to improve under-segmented areas, an approach that achieves higher F1 scores and intersection over union scores. The developed framework highlights the potential in combining machine learning and image-processing techniques to develop reliable diagnostic tools. Full article
Show Figures

Figure 1

18 pages, 2884 KB  
Article
Research on Multi-Path Feature Fusion Manchu Recognition Based on Swin Transformer
by Yu Zhou, Mingyan Li, Hang Yu, Jinchi Yu, Mingchen Sun and Dadong Wang
Symmetry 2025, 17(9), 1408; https://doi.org/10.3390/sym17091408 - 29 Aug 2025
Viewed by 187
Abstract
Recognizing Manchu words can be challenging due to their complex character variations, subtle differences between similar characters, and homographic polysemy. Most studies rely on character segmentation techniques for character recognition or use convolutional neural networks (CNNs) to encode word images for word recognition. [...] Read more.
Recognizing Manchu words can be challenging due to their complex character variations, subtle differences between similar characters, and homographic polysemy. Most studies rely on character segmentation techniques for character recognition or use convolutional neural networks (CNNs) to encode word images for word recognition. However, these methods can lead to segmentation errors or a loss of semantic information, which reduces the accuracy of word recognition. To address the limitations in the long-range dependency modeling of CNNs and enhance semantic coherence, we propose a hybrid architecture to fuse the spatial features of original images and spectral features. Specifically, we first leverage the Short-Time Fourier Transform (STFT) to preprocess the raw input images and thereby obtain their multi-view spectral features. Then, we leverage a primary CNN block and a pair of symmetric CNN blocks to construct a symmetric spectral enhancement module, which is used to encode the raw input features and the multi-view spectral features. Subsequently, we design a feature fusion module via Swin Transformer to fuse multi-view spectral embedding and thereby concat it with the raw input embedding. Finally, we leverage a Transformer decoder to obtain the target output. We conducted extensive experiments on Manchu words benchmark datasets to evaluate the effectiveness of our proposed framework. The experimental results demonstrated that our framework performs robustly in word recognition tasks and exhibits excellent generalization capabilities. Additionally, our model outperformed other baseline methods in multiple writing-style font-recognition tasks. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

24 pages, 2282 KB  
Article
Top-k Bottom All but σ Loss Strategy for Medical Image Segmentation
by Corneliu Florea, Laura Florea and Constantin Vertan
Diagnostics 2025, 15(17), 2189; https://doi.org/10.3390/diagnostics15172189 - 29 Aug 2025
Viewed by 292
Abstract
Background/Objectives In this study we approach the problem of medical image segmentation by introducing a new loss function envelope that is derived from the Top-k loss strategy. We exploit the fact that, for semantic segmentation, the training loss is computed at two levels, [...] Read more.
Background/Objectives In this study we approach the problem of medical image segmentation by introducing a new loss function envelope that is derived from the Top-k loss strategy. We exploit the fact that, for semantic segmentation, the training loss is computed at two levels, more specifically at pixel level and at image level. Quite often, the envisaged problem has particularities that include noisy annotation at pixel level and limited data, but with accurate annotations at image level. Methods To address the mentioned issues, the Top-k strategy at image level and respectively the “Bottom all but σ” strategy at pixel level are assumed. To deal with the discontinuities of the differentials faced in the automatic learning, a derivative smoothing procedure is introduced. Results The method is thoroughly and successfully tested (in conjunction with a variety of backbone models) for several medical image segmentation tasks performed onto a variety of image acquisition types and human body regions. We present the burned skin area segmentation in standard color images, the segmentation of fetal abdominal structures in ultrasound images and ventricles and myocardium segmentation in cardiac MRI images, in all cases yielding performance improvements. Conclusions The proposed novel mechanism enhances model training by selectively emphasizing certain loss values by the use of two complementary strategies. The major benefits of the approach are clear in challenging scenarios, where the segmentation problem is inherently difficult or where the quality of pixel-level annotations is degraded by noise or inconsistencies. The proposed approach performs equally well in both convolutional neural networks (CNNs) and vision transformer (ViT) architectures. Full article
(This article belongs to the Special Issue 3rd Edition: AI/ML-Based Medical Image Processing and Analysis)
Show Figures

Figure 1

26 pages, 5066 KB  
Article
DSM-Seg: A CNN-RWKV Hybrid Framework for Forward-Looking Sonar Image Segmentation in Deep-Sea Mining
by Xinran Liu, Jianmin Yang, Enhua Zhang, Wenhao Xu and Changyu Lu
Remote Sens. 2025, 17(17), 2997; https://doi.org/10.3390/rs17172997 - 28 Aug 2025
Viewed by 291
Abstract
Accurate and real-time environmental perception is essential for the safe and efficient execution of deep-sea mining operations. Semantic segmentation of forward-looking sonar (FLS) images plays a pivotal role in enabling environmental awareness for deep-sea mining vehicles (DSMVs), but remains challenging due to strong [...] Read more.
Accurate and real-time environmental perception is essential for the safe and efficient execution of deep-sea mining operations. Semantic segmentation of forward-looking sonar (FLS) images plays a pivotal role in enabling environmental awareness for deep-sea mining vehicles (DSMVs), but remains challenging due to strong acoustic noise, blurred object boundaries, and long-range semantic dependencies. To address these issues, this study proposes DSM-Seg, a novel hybrid segmentation architecture combining Convolutional Neural Networks (CNNs) and Receptance Weighted Key-Value (RWKV) modeling. The architecture integrates a Physical Prior-Based Semantic Guidance Module (PSGM), which utilizes sonar-specific physical priors to produce high-confidence semantic guidance maps, thereby enhancing the delineation of target boundaries. In addition, a RWKV-Based Global Fusion with Semantic Constraints (RGFSC) module is introduced to suppress cross-regional interference in long-range dependency modeling and achieve the effective fusion of local and global semantic information. Extensive experiments on both a self-collected seabed terrain dataset and a public marine debris dataset demonstrate that DSM-Seg significantly improves segmentation accuracy under complex conditions while satisfying real-time performance requirements. These results highlight the potential of the proposed method to support intelligent environmental perception in DSMV applications. Full article
Show Figures

Figure 1

27 pages, 7855 KB  
Article
Design of an Automated System for Classifying Maturation Stages of Erythrina edulis Beans Using Computer Vision and Convolutional Neural Networks
by Hector Pasache, Cristian Tuesta and Carlos Inga
AgriEngineering 2025, 7(9), 277; https://doi.org/10.3390/agriengineering7090277 - 27 Aug 2025
Viewed by 491
Abstract
Erythrina edulis, commonly known as pajuro, is a large leguminous plant native to the Amazon region of Peru. Its seeds are valued for their high protein content and their potential to enhance food security in rural communities. However, the current methods of [...] Read more.
Erythrina edulis, commonly known as pajuro, is a large leguminous plant native to the Amazon region of Peru. Its seeds are valued for their high protein content and their potential to enhance food security in rural communities. However, the current methods of harvesting and sorting are entirely manual, making the process labor-intensive, time-consuming, and subject to high variability, particularly in industrial contexts. A custom lightweight convolutional neural network (CNN) was developed from scratch and optimized specifically for real-time execution on embedded hardware. The model employs ReLU activation, Adam optimization, and a SoftMax output layer to enable efficient and accurate classification. The system employs a fixed-region segmentation strategy to prevent overcounting and utilizes GPIO-based control on a Raspberry Pi 5 to synchronize seed classification with physical sorting in real time. Seeds identified as defective are automatically removed via a servo-controlled ejection mechanism. The integrated system combines object detection, image processing, and real-time actuation, achieving a classification accuracy exceeding 99.6% and an average processing time of 12.4 milliseconds per seed. The proposed solution contributes to the industrial automation of pajuro sorting and provides a scalable framework for color-based grain classification applicable to a wide range of agricultural products. Full article
(This article belongs to the Special Issue Implementation of Artificial Intelligence in Agriculture)
Show Figures

Figure 1

21 pages, 3725 KB  
Article
Pruning-Friendly RGB-T Semantic Segmentation for Real-Time Processing on Edge Devices
by Jun Young Hwang, Youn Joo Lee, Ho Gi Jung and Jae Kyu Suhr
Electronics 2025, 14(17), 3408; https://doi.org/10.3390/electronics14173408 - 27 Aug 2025
Viewed by 226
Abstract
RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based [...] Read more.
RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based approaches, which most recent RGB-T semantic segmentation studies belong to, are very difficult to perform on edge devices, this paper considers only CNN-based RGB-T semantic segmentation networks that can be performed on edge devices and operated in real time. Although EAEFNet shows the best performance among CNN-based networks on edge devices, its inference speed is too slow for real-time operation. Furthermore, even when channel pruning is applied, the speed improvement is minimal. The analysis of EAEFNet identifies the intermediate fusion of RGB and thermal features and the high complexity of the decoder as the main causes. To address these issues, this paper proposes a network using a ResNet encoder with an early-fused four-channel input and the U-Net decoder structure. To improve the decoder performance, bilinear upsampling is replaced with PixelShuffle. Additionally, mini Atrous Spatial Pyramid Pooling (ASPP) and Progressive Transposed Module (PTM) modules are applied. Since the Proposed Network is primarily composed of convolutional layers, channel pruning is confirmed to be effectively applicable. Consequently, channel pruning significantly improves inference speed, and enables real-time operation on the neural processing unit (NPU) of edge devices. The Proposed Network is evaluated using the MFNet dataset, one of the most widely used public datasets for RGB-T semantic segmentation. It is shown that the proposed method achieves a performance comparable to EAEFNet while operating at over 30 FPS on an embedded board equipped with the Qualcomm QCS6490 SoC. Full article
(This article belongs to the Special Issue New Insights in 2D and 3D Object Detection and Semantic Segmentation)
Show Figures

Figure 1

Back to TopTop