MDPI - Publisher of Open Access Journals

16 pages, 1402 KB

Open AccessArticle

A Sparse Attention Mechanism Based Redundancy-Aware Retrieval Framework for Power Grid Inspection Images

by Wei Yang, Zhenyu Chen, Xiaoguang Huang, Ming Li, Hailu Wang and Shi Liu

Electronics 2025, 14(18), 3585; https://doi.org/10.3390/electronics14183585 - 10 Sep 2025

Driven by the rapid advancement of smart grid frameworks, the volume of visual data collected from power system diagnostic equipment has surged exponentially. A substantial portion of these images (30–40%) are redundant or highly similar, primarily due to periodic monitoring and repeated acquisitions [...] Read more.

Driven by the rapid advancement of smart grid frameworks, the volume of visual data collected from power system diagnostic equipment has surged exponentially. A substantial portion of these images (30–40%) are redundant or highly similar, primarily due to periodic monitoring and repeated acquisitions from multiple angles. Traditional redundancy removal methods based on manual screening or single-feature matching are often inefficient and lack adaptability. In this paper, we propose a two-stage redundancy removal paradigm for power inspection imagery, which integrates abstract semantic priors with fine-grained perceptual details. The first stage combines an improved discrete cosine transform hash (DCT Hash) with the multi-scale structural similarity index (MS-SSIM) to efficiently filter redundant candidates. In the second stage, a Vision Transformer network enhanced with a hierarchical sparse attention mechanism precisely determines redundancy via cosine similarity between feature vectors. Experimental results demonstrate that the proposed method achieves an algorithm sensitivity of 0.9243, surpassing ResNet and VGG by 5.86 and 8.10 percentage points, respectively, highlighting its robustness and effectiveness in large-scale power grid redundancy detection. These results underscore the paradigm’s capability to balance efficiency and precision in complex visual inspection scenarios. Full article

(This article belongs to the Special Issue Recent Progress in Visual AI: Architectures, Learning, and Applications)

► Show Figures

Figure 1

15 pages, 3118 KB

Open AccessCommunication

Two-Stage Marker Detection–Localization Network for Bridge-Erecting Machine Hoisting Alignment

by Lei Li, Zelong Xiao and Taiyang Hu

Sensors 2025, 25(17), 5604; https://doi.org/10.3390/s25175604 - 8 Sep 2025

Abstract

To tackle the challenges of complex construction environment interference (e.g., lighting variations, occlusion, and marker contamination) and the demand for high-precision alignment during the hoisting process of bridge-erecting machines, this paper presents a two-stage marker detection–localization network tailored to hoisting alignment. The proposed [...] Read more.

To tackle the challenges of complex construction environment interference (e.g., lighting variations, occlusion, and marker contamination) and the demand for high-precision alignment during the hoisting process of bridge-erecting machines, this paper presents a two-stage marker detection–localization network tailored to hoisting alignment. The proposed network adopts a “coarse detection–fine estimation” phased framework; the first stage employs a lightweight detection module, which integrates a dynamic hybrid backbone (DHB) and dynamic switching mechanism to efficiently filter background noise and generate coarse localization boxes of marker regions. Specifically, the DHB dynamically switches between convolutional and Transformer branches to handle features of varying complexity (using depthwise separable convolutions from MobileNetV3 for low-level geometric features and lightweight Transformer blocks for high-level semantic features). The second stage constructs a Transformer-based homography estimation module, which leverages multi-head self-attention to capture long-range dependencies between marker keypoints and the scene context. By integrating enhanced multi-scale feature interaction and position encoding (combining the absolute position and marker geometric priors), this module achieves the end-to-end learning of precise homography matrices between markers and hoisting equipment from the coarse localization boxes. To address data scarcity in construction scenes, a multi-dimensional data augmentation strategy is developed, including random homography transformation (simulating viewpoint changes), photometric augmentation (adjusting brightness, saturation, and contrast), and background blending with bounding box extraction. Experiments on a real bridge-erecting machine dataset demonstrate that the network achieves detection accuracy (mAP) of 97.8%, a homography estimation reprojection error of less than 1.2 mm, and a processing frame rate of 32 FPS. Compared with traditional single-stage CNN-based methods, it significantly improves the alignment precision and robustness in complex environments, offering reliable technical support for the precise control of automated hoisting in bridge-erecting machines. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision Sensing Technologies in Engineering Applications and Digital Twins)

► Show Figures

Figure 1

31 pages, 8445 KB

Open AccessArticle

HIRD-Net: An Explainable CNN-Based Framework with Attention Mechanism for Diabetic Retinopathy Diagnosis Using CLAHE-D-DoG Enhanced Fundus Images

by Muhammad Hassaan Ashraf, Muhammad Nabeel Mehmood, Musharif Ahmed, Dildar Hussain, Jawad Khan, Younhyun Jung, Mohammed Zakariah and Deema Mohammed AlSekait

Life 2025, 15(9), 1411; https://doi.org/10.3390/life15091411 - 8 Sep 2025

Viewed by 255

Abstract

Diabetic Retinopathy (DR) is a leading cause of vision impairment globally, underscoring the need for accurate and early diagnosis to prevent disease progression. Although fundus imaging serves as a cornerstone of Computer-Aided Diagnosis (CAD) systems, several challenges persist, including lesion scale variability, blurry [...] Read more.

Diabetic Retinopathy (DR) is a leading cause of vision impairment globally, underscoring the need for accurate and early diagnosis to prevent disease progression. Although fundus imaging serves as a cornerstone of Computer-Aided Diagnosis (CAD) systems, several challenges persist, including lesion scale variability, blurry morphological patterns, inter-class imbalance, limited labeled datasets, and computational inefficiencies. To address these issues, this study proposes an end-to-end diagnostic framework that integrates an enhanced preprocessing pipeline with a novel deep learning architecture, Hierarchical-Inception-Residual-Dense Network (HIRD-Net). The preprocessing stage combines Contrast Limited Adaptive Histogram Equalization (CLAHE) with Dilated Difference of Gaussian (D-DoG) filtering to improve image contrast and highlight fine-grained retinal structures. HIRD-Net features a hierarchical feature fusion stem alongside multiscale, multilevel inception-residual-dense blocks for robust representation learning. The Squeeze-and-Excitation Channel Attention (SECA) is introduced before each Global Average Pooling (GAP) layer to refine the Feature Maps (FMs). It further incorporates four GAP layers for multi-scale semantic aggregation, employs the Hard-Swish activation to enhance gradient flow, and utilizes the Focal Loss function to mitigate class imbalance issues. Experimental results on the IDRiD-APTOS2019, DDR, and EyePACS datasets demonstrate that the proposed framework achieves 93.46%, 82.45% and 79.94% overall classification accuracy using only 4.8 million parameters, highlighting its strong generalization capability and computational efficiency. Furthermore, to ensure transparent predictions, an Explainable AI (XAI) approach known as Gradient-weighted Class Activation Mapping (Grad-CAM) is employed to visualize HIRD-Net’s decision-making process. Full article

(This article belongs to the Special Issue Advanced Machine Learning for Disease Prediction and Prevention)

► Show Figures

Figure 1

26 pages, 5655 KB

Open AccessArticle

A Hierarchical Multi-Feature Point Cloud Lithology Identification Method Based on Feature-Preserved Compressive Sampling (FPCS)

by Xiaolei Duan, Ran Jing, Yanlin Shao, Yuangang Liu, Binqing Gan, Peijin Li and Longfan Li

Sensors 2025, 25(17), 5549; https://doi.org/10.3390/s25175549 - 5 Sep 2025

Viewed by 542

Abstract

Lithology identification is a critical technology for geological resource exploration and engineering safety assessment. However, traditional methods suffer from insufficient feature representation and low classification accuracy due to challenges such as weathering, vegetation cover, and spectral overlap in complex sedimentary rock regions. This [...] Read more.

Lithology identification is a critical technology for geological resource exploration and engineering safety assessment. However, traditional methods suffer from insufficient feature representation and low classification accuracy due to challenges such as weathering, vegetation cover, and spectral overlap in complex sedimentary rock regions. This study proposes a hierarchical multi-feature random forest algorithm based on Feature-Preserved Compressive Sampling (FPCS). Using 3D laser point cloud data from the Manas River outcrop in the southern margin of the Junggar Basin as the test area, we integrate graph signal processing and multi-scale feature fusion to construct a high-precision lithology identification model. The FPCS method establishes a geologically adaptive graph model constrained by geodesic distance and gradient-sensitive weighting, employing a three-tier graph filter bank (low-pass, band-pass, and high-pass) to extract macroscopic morphology, interface gradients, and microscopic fracture features of rock layers. A dynamic gated fusion mechanism optimizes multi-level feature weights, significantly improving identification accuracy in lithological transition zones. Experimental results on five million test samples demonstrate an overall accuracy (OA) of 95.6% and a mean accuracy (mAcc) of 94.3%, representing improvements of 36.1% and 20.5%, respectively, over the PointNet model. These findings confirm the robust engineering applicability of the FPCS-based hierarchical multi-feature approach for point cloud lithology identification. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

7 pages, 561 KB

Open AccessProceeding Paper

Hybrid 3D Mesh Reconstruction Models of CT Images for Deep Learning Based Classification of Kidney Tumors

by Muhammed Ahmet Demirtaş, Alparslan Burak İnner and Adnan Kavak

Eng. Proc. 2025, 104(1), 79; https://doi.org/10.3390/engproc2025104079 - 4 Sep 2025

Viewed by 179

Abstract

We present a comparative analysis of three hybrid methodologies for transforming 3D kidney tumor segmentations of volumetric NIfTI data into highly accurate network representations. Exploiting the KiTS23 dataset, we evaluate edge-preserving reconstruction pipelines integrating anisotropic diffusion, multiscale Gaussian filtering and KNN-based network optimisation. [...] Read more.

We present a comparative analysis of three hybrid methodologies for transforming 3D kidney tumor segmentations of volumetric NIfTI data into highly accurate network representations. Exploiting the KiTS23 dataset, we evaluate edge-preserving reconstruction pipelines integrating anisotropic diffusion, multiscale Gaussian filtering and KNN-based network optimisation. Model 1 uses Gaussian smoothing with Marching Cubes, while Model 2 uses spline interpolation and Perona-Malik filtering for improved resolution. Model 3 extends this structure with normal sensitive vertex smoothing to preserve critical anatomical interfaces. Quantitative metrics (Dice score, HD₉₅) demonstrated the advantage of Model 3, which achieved a 22% reduction in the Hausdorff distance error rate compared to conventional methods while maintaining segmentation accuracy (Dice > 0.92). The proposed unsupervised pipeline bridges the gap between clinical interpretability and computational accuracy, providing a robust infrastructure for further applications in surgical planning and deep learning-based classification. Full article

► Show Figures

Figure 1

21 pages, 4900 KB

Open AccessArticle

RingFormer-Seg: A Scalable and Context-Preserving Vision Transformer Framework for Semantic Segmentation of Ultra-High-Resolution Remote Sensing Imagery

by Zhan Zhang, Daoyu Shu, Guihe Gu, Wenkai Hu, Ru Wang, Xiaoling Chen and Bingnan Yang

Remote Sens. 2025, 17(17), 3064; https://doi.org/10.3390/rs17173064 - 3 Sep 2025

Viewed by 569

Abstract

Semantic segmentation of ultra-high-resolution remote sensing (UHR-RS) imagery plays a critical role in land use and land cover analysis, yet it remains computationally intensive due to the enormous input size and high spatial complexity. Existing studies have commonly employed strategies such as patch-wise [...] Read more.

Semantic segmentation of ultra-high-resolution remote sensing (UHR-RS) imagery plays a critical role in land use and land cover analysis, yet it remains computationally intensive due to the enormous input size and high spatial complexity. Existing studies have commonly employed strategies such as patch-wise processing, multi-scale model architectures, lightweight networks, and representation sparsification to reduce resource demands, but they have often struggled to maintain long-range contextual awareness and scalability for inputs of arbitrary size. To address this, we propose RingFormer-Seg, a scalable Vision Transformer framework that enables long-range context learning through multi-device parallelism in UHR-RS image segmentation. RingFormer-Seg decomposes the input into spatial subregions and processes them through a distributed three-stage pipeline. First, the Saliency-Aware Token Filter (STF) selects informative tokens to reduce redundancy. Next, the Efficient Local Context Module (ELCM) enhances intra-region features via memory-efficient attention. Finally, the Cross-Device Context Router (CDCR) exchanges token-level information across devices to capture global dependencies. Fine-grained detail is preserved through the residual integration of unselected tokens, and a hierarchical decoder generates high-resolution segmentation outputs. We conducted extensive experiments on three benchmarks covering UHR-RS images from 2048 × 2048 to 8192 × 8192 pixels. Results show that our framework achieves top segmentation accuracy while significantly improving computational efficiency across the DeepGlobe, Wuhan, and Guangdong datasets. RingFormer-Seg offers a versatile solution for UHR-RS image segmentation and demonstrates potential for practical deployment in nationwide land cover mapping, supporting informed decision-making in land resource management, environmental policy planning, and sustainable development. Full article

► Show Figures

Figure 1

18 pages, 1153 KB

Open AccessProceeding Paper

Improved YOLOv5 Lane Line Real Time Segmentation System Integrating Seg Head Network

by Qu Feilong, Navid Ali Khan, N. Z. Jhanjhi, Farzeen Ashfaq and Trisiani Dewi Hendrawati

Eng. Proc. 2025, 107(1), 49; https://doi.org/10.3390/engproc2025107049 - 2 Sep 2025

Viewed by 270

Abstract

With the rise in motor vehicles, driving safety is a major concern, and autonomous driving technology plays a key role in enhancing safety. Vision-based lane departure warning systems are essential for accurate navigation, focusing on lane line detection. This paper reviews the development [...] Read more.

With the rise in motor vehicles, driving safety is a major concern, and autonomous driving technology plays a key role in enhancing safety. Vision-based lane departure warning systems are essential for accurate navigation, focusing on lane line detection. This paper reviews the development of such systems and highlights the limitations of traditional image processing. To improve lane line detection, a dataset from Roboflow Universe will be used, incorporating techniques like priority pixels, least squares fitting for positioning, and a Kalman filter for tracking. YOLOv5 will be enhanced with a di-versified branch block (DBB) for better multi-scale feature extraction and an improved segmentation head inspired by YOLACT (You Only Look At CoefficienTs) for precise lane line segmentation. A multi-scale feature fusion mechanism with self-attention will be introduced to improve robustness. Experiments will demonstrate that the improved YOLOv5 outperforms other models in accuracy, recall, and mAP@0.5. Future work will focus on optimizing the model structure and enhancing the fusion mechanism for better performance. Full article

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

► Show Figures

Figure 1

22 pages, 1243 KB

Open AccessArticle

ProCo-NET: Progressive Strip Convolution and Frequency- Optimized Framework for Scale-Gradient-Aware Semantic Segmentation in Off-Road Scenes

by Zihang Liu, Donglin Jing and Chenxiang Ji

Symmetry 2025, 17(9), 1428; https://doi.org/10.3390/sym17091428 - 2 Sep 2025

Viewed by 346

Abstract

In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of [...] Read more.

In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of targets, causing traditional segmentation networks to face three key challenges: (1) inefficientcapture of continuous-scale features, where pyramid structures and multi-scale kernels struggle to balance computational efficiency with sufficient coverage of progressive scales; (2) degraded intra-class feature consistency, where local scale differences within targets induce semantic ambiguity; and (3) loss of high-frequency boundary information, where feature sampling operations exacerbate the blurring of progressive boundaries. To address these issues, this paper proposes the ProCo-NET framework for systematic optimization. Firstly, a Progressive Strip Convolution Group (PSCG) is designed to construct multi-level receptive field expansion through orthogonally oriented strip convolution cascading (employing symmetric processing in horizontal/vertical directions) integrated with self-attention mechanisms, enhancing perception capability for asymmetric continuous-scale variations. Secondly, an Offset-Frequency Cooperative Module (OFCM) is developed wherein a learnable offset generator dynamically adjusts sampling point distributions to enhance intra-class consistency, while a dual-channel frequency domain filter performs adaptive high-pass filtering to sharpen target boundaries. These components synergistically solve feature consistency degradation and boundary ambiguity under asymmetric changes. Experiments show that this framework significantly improves the segmentation accuracy and boundary clarity of multi-scale targets in off-road scene segmentation tasks: it achieves 71.22% MIoU on the standard RUGD dataset (0.84% higher than the existing optimal method) and 83.05% MIoU on the Freiburg_Forest dataset. Among them, the segmentation accuracy of key obstacle categories is significantly improved to 52.04% (2.7% higher than the sub-optimal model). This framework effectively compensates for the impact of asymmetric deformation through a symmetric computing mechanism. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

19 pages, 2082 KB

Open AccessArticle

Multi-Scale Grid-Based Semantic Surface Point Generation for 3D Object Detection

by Xin-Fu Chen, Chun-Chieh Lee, Jung-Hua Lo, Chi-Hung Chuang and Kuo-Chin Fan

Electronics 2025, 14(17), 3492; https://doi.org/10.3390/electronics14173492 - 31 Aug 2025

Viewed by 369

Abstract

3D object detection is a crucial technology in fields such as autonomous driving and robotics. As a direct representation of the 3D world, point cloud data plays a vital role in feature extraction and geometric representation. However, in real-world applications, point cloud data [...] Read more.

3D object detection is a crucial technology in fields such as autonomous driving and robotics. As a direct representation of the 3D world, point cloud data plays a vital role in feature extraction and geometric representation. However, in real-world applications, point cloud data often suffers from occlusion, resulting in incomplete observations and degraded detection performance. Existing methods, such as PG-RCNN, generate semantic surface points within each Region of Interest (RoI) using a single grid size. However, a fixed grid scale cannot adequately capture multi-scale features. A grid that is too small may miss fine structures—especially problematic when dealing with small or sparse objects—while a grid that is too large may introduce excessive background noise, reducing the precision of feature representation. To address this issue, we propose an enhanced PG-RCNN architecture with a Multi-Scale Grid Attention Module as the core contribution. This module improves the expressiveness of point features by aggregating multi-scale information and dynamically weighting features from different grid resolutions. Using a simple linear transformation, we generate attention weights to guide the model to focus on regions that contribute more to object recognition, while effectively filtering out redundant noise. We evaluate our method on the KITTI 3D object detection validation set. Experimental results show that, compared to the original PG-RCNN, our approach improves performance on the Cyclist category by 2.66% and 2.54% in the Moderate and Hard settings, respectively. Additionally, our approach shows more stable performance on small object detection tasks, with an average improvement of 2.57%, validating the positive impact of the Multi-Scale Grid Attention Module on fine-grained geometric modeling, and highlighting the efficiency and generalizability of our model. Full article

(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)

► Show Figures

Figure 1

22 pages, 2758 KB

Open AccessArticle

GPRGS: Sparse Input New View Synthesis Based on Probabilistic Modeling and Feature Regularization

by Yinshuang Qin, Gen Liu and Jian Wang

Appl. Sci. 2025, 15(17), 9422; https://doi.org/10.3390/app15179422 - 27 Aug 2025

Viewed by 466

Abstract

When the number of available training views is limited, the small quantity of images results in insufficient generation of Gaussian ellipsoids, leading to an empty Gaussian model. This constraint limits the generation of Gaussian ellipsoids within 3DGS. If the number of Gaussian ellipsoids [...] Read more.

When the number of available training views is limited, the small quantity of images results in insufficient generation of Gaussian ellipsoids, leading to an empty Gaussian model. This constraint limits the generation of Gaussian ellipsoids within 3DGS. If the number of Gaussian ellipsoids is too low, the model is prone to overfitting and may learn incorrect scene geometry. To address this challenge, we propose 3DGS based on Gaussian probabilistic modeling and feature regularization (GPRGS). Our method employs Gaussian probabilistic modeling based on Gaussian distribution features, where we capture feature information from images and establish a Gaussian distribution to model the feature probability map. Additionally, feature regularization is introduced to enhance image features and prevent overfitting. Moreover, we introduce scale and densification thresholds and update the multi-scale densification and pruning strategy to avoid filtering out all low-opacity Gaussian points during the pruning process. We conducted evaluations for new view synthesis with both full and sparse inputs on real and synthetic datasets. The results demonstrate that GPRGS is on par with other models. In sparse environments, we achieve a slight advantage, specifically showing an approximately 4% improvement in the PSNR metric across multiple evaluation metrics. Full article

► Show Figures

Figure 1

17 pages, 45337 KB

Open AccessArticle

Contrastive Learning-Driven Image Dehazing with Multi-Scale Feature Fusion and Hybrid Attention Mechanism

by Huazhong Zhang, Jiaozhuo Wang, Xiaoguang Tu, Zhiyi Niu and Yu Wang

J. Imaging 2025, 11(9), 290; https://doi.org/10.3390/jimaging11090290 - 26 Aug 2025

Viewed by 481

Abstract

Image dehazing is critical for visual enhancement and a wide range of computer vision applications. Despite significant advancements, challenges remain in preserving fine details and adapting to diverse, non-uniformly degraded scenes. To address these issues, we propose a novel image dehazing method that [...] Read more.

Image dehazing is critical for visual enhancement and a wide range of computer vision applications. Despite significant advancements, challenges remain in preserving fine details and adapting to diverse, non-uniformly degraded scenes. To address these issues, we propose a novel image dehazing method that introduces a contrastive learning framework, enhanced by the InfoNCE loss, to improve model robustness. In this framework, hazy images are treated as negative samples and their clear counterparts as positive samples. By optimizing the InfoNCE loss, the model is trained to maximize the similarity between positive pairs and minimize that between negative pairs, thereby improving its ability to distinguish haze artifacts from intrinsic scene features and better preserving the structural integrity of images. In addition to contrastive learning, our method integrates a multi-scale dynamic feature fusion with a hybrid attention mechanism. Specifically, we introduce dynamically adjustable frequency band filters and refine the hybrid attention module to more effectively capture fine-grained, cross-scale image details. Extensive experiments on the RESIDE-6K and RS-Haze datasets demonstrate that our approach outperforms most existing methods, offering a promising solution for practical image dehazing applications. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Computer Vision Applications)

► Show Figures

Figure 1

34 pages, 2708 KB

Open AccessFeature PaperArticle

Integrating Temporal Event Prediction and Large Language Models for Automatic Commentary Generation in Video Games

by Xuanyu Sheng, Aihe Yu, Mingfeng Zhang, Gayoung An, Jisun Park and Kyungeun Cho

Mathematics 2025, 13(17), 2738; https://doi.org/10.3390/math13172738 - 26 Aug 2025

Viewed by 618

Abstract

Game commentary enhances viewer immersion and understanding, particularly in football video games, where dynamic gameplay offers ideal conditions for automated commentary. The existing methods often rely on predefined templates and game state inputs combined with an LLM, such as GPT-3.5. However, they frequently [...] Read more.

Game commentary enhances viewer immersion and understanding, particularly in football video games, where dynamic gameplay offers ideal conditions for automated commentary. The existing methods often rely on predefined templates and game state inputs combined with an LLM, such as GPT-3.5. However, they frequently suffer from repetitive phrasing and delayed responses. Recent studies have attempted to mitigate the response delays by employing traditional machine learning models, such as SVM and ANN, for event prediction. Nonetheless, these models fail to capture the temporal dependencies in gameplay sequences, thereby limiting their predictive performance. To address these limitations, an integrated framework is proposed, combining a lightweight convolutional model with multi-scale temporal filters (OS-CNN) for real-time event prediction and an open-source LLM (LLaMA 3.3) for dynamic commentary generation. Our method incorporates prompt engineering techniques by embedding predicted events into contextualized instruction templates, which enables the LLM to produce fluent and diverse commentary tailored to ongoing gameplay. Evaluated in the Google Research Football environment, the proposed method achieved an F1-score of 0.7470 in the balanced setting, closely matching the best-performing GRU model (0.7547) while outperforming SVM (0.5271) and Transformer (0.7344). In the more realistic Balanced–Imbalanced setting, it attained the highest F1-score of 0.8503, substantially exceeding SVM (0.4708), GRU (0.7376), and Transformer (0.5085). Additionally, it enhances the lexical diversity (Distinct-2: +32.1%) and reduces the phrase repetition by 42.3% (Self-BLEU), compared with template-based generation. These results demonstrate the effectiveness of our approach in generating context-aware, low-latency, and natural commentary suitable for real-time deployment in football video games. Full article

(This article belongs to the Special Issue Applied Mathematics in Artificial Intelligence: Methods, Algorithms, and Applications)

► Show Figures

Figure 1

17 pages, 588 KB

Open AccessArticle

An Accurate and Efficient Diabetic Retinopathy Diagnosis Method via Depthwise Separable Convolution and Multi-View Attention Mechanism

by Qing Yang, Ying Wei, Fei Liu and Zhuang Wu

Appl. Sci. 2025, 15(17), 9298; https://doi.org/10.3390/app15179298 - 24 Aug 2025

Viewed by 444

Abstract

Diabetic retinopathy (DR), a critical ocular disease that can lead to blindness, demands early and accurate diagnosis to prevent vision loss. Current automated DR diagnosis methods face two core challenges: first, subtle early lesions such as microaneurysms are often missed due to insufficient [...] Read more.

Diabetic retinopathy (DR), a critical ocular disease that can lead to blindness, demands early and accurate diagnosis to prevent vision loss. Current automated DR diagnosis methods face two core challenges: first, subtle early lesions such as microaneurysms are often missed due to insufficient feature extraction; second, there is a persistent trade-off between model accuracy and efficiency—lightweight architectures often sacrifice precision for real-time performance, while high-accuracy models are computationally expensive and difficult to deploy on resource-constrained edge devices. To address these issues, this study presents a novel deep learning framework integrating depthwise separable convolution and a multi-view attention mechanism (MVAM) for efficient DR diagnosis using retinal images. The framework employs multi-scale feature fusion via parallel 3 × 3 and 5 × 5 convolutions to capture lesions of varying sizes and incorporates Gabor filters to enhance vascular texture and directional lesion modeling, improving sensitivity to early structural abnormalities while reducing computational costs. Experimental results on both the diabetic retinopathy (DR) dataset and ocular disease (OD) dataset demonstrate the superiority of the proposed method: it achieves a high accuracy of 0.9697 on the DR dataset and 0.9669 on the OD dataset, outperforming traditional methods such as CNN_eye, VGG, and UNet by more than 1 percentage point. Moreover, its training time is only half that of U-Net (on DR dataset) and VGG (on OD dataset), highlighting its potential for clinical DR screening. Full article

► Show Figures

Figure 1

16 pages, 1786 KB

Open AccessArticle

Enhanced SSVEP Bionic Spelling via xLSTM-Based Deep Learning with Spatial Attention and Filter Bank Techniques

by Liuyuan Dong, Chengzhi Xu, Ruizhen Xie, Xuyang Wang, Wanli Yang and Yimeng Li

Biomimetics 2025, 10(8), 554; https://doi.org/10.3390/biomimetics10080554 - 21 Aug 2025

Viewed by 443

Abstract

Steady-State Visual Evoked Potentials (SSVEPs) have emerged as an efficient means of interaction in brain–computer interfaces (BCIs), achieving bioinspired efficient language output for individuals with aphasia. Addressing the underutilization of frequency information of SSVEPs and redundant computation by existing transformer-based deep learning methods, [...] Read more.

Steady-State Visual Evoked Potentials (SSVEPs) have emerged as an efficient means of interaction in brain–computer interfaces (BCIs), achieving bioinspired efficient language output for individuals with aphasia. Addressing the underutilization of frequency information of SSVEPs and redundant computation by existing transformer-based deep learning methods, this paper analyzes signals from both the time and frequency domains, proposing a stacked encoder–decoder (SED) network architecture based on an xLSTM model and spatial attention mechanism, termed SED-xLSTM, which firstly applies xLSTM to the SSVEP speller field. This model takes the low-channel spectrogram as input and employs the filter bank technique to make full use of harmonic information. By leveraging a gating mechanism, SED-xLSTM effectively extracts and fuses high-dimensional spatial-channel semantic features from SSVEP signals. Experimental results on three public datasets demonstrate the superior performance of SED-xLSTM in terms of classification accuracy and information transfer rate, particularly outperforming existing methods under cross-validation across various temporal scales. Full article

(This article belongs to the Special Issue Exploration of Bioinspired Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

22 pages, 17979 KB

Open AccessArticle

AFBF-YOLO: An Improved YOLO11n Algorithm for Detecting Bunch and Maturity of Cherry Tomatoes in Greenhouse Environments

by Bo-Jin Chen, Jun-Yan Bu, Jun-Lin Xia, Ming-Xuan Li and Wen-Hao Su

Plants 2025, 14(16), 2587; https://doi.org/10.3390/plants14162587 - 20 Aug 2025

Viewed by 563

Abstract

Accurate detection of cherry tomato clusters and their ripeness stages is critical for the development of intelligent harvesting systems in modern agriculture. In response to the challenges posed by occlusion, overlapping clusters, and subtle ripeness variations under complex greenhouse environments, an improved YOLO11-based [...] Read more.

Accurate detection of cherry tomato clusters and their ripeness stages is critical for the development of intelligent harvesting systems in modern agriculture. In response to the challenges posed by occlusion, overlapping clusters, and subtle ripeness variations under complex greenhouse environments, an improved YOLO11-based deep convolutional neural network detection model, called AFBF-YOLO, is proposed in this paper. First, a dataset comprising 486 RGB images and over 150,000 annotated instances was constructed and augmented, covering four ripeness stages and fruit clusters. Then, based on YOLO11, the ACmix attention mechanism was incorporated to strengthen feature representation under occluded and cluttered conditions. Additionally, a novel neck structure, FreqFusion-BiFPN, was designed to improve multi-scale feature fusion through frequency-aware filtering. Finally, a refined loss function, Inner-Focaler-IoU, was applied to enhance bounding box localization by emphasizing inner-region overlap and focusing on difficult samples. Experimental results show that AFBF-YOLO achieves a precision of 81.2%, a recall of 81.3%, and an mAP@0.5 of 85.6%, outperforming multiple mainstream YOLO series. High accuracy across ripeness stages and low computational complexity indicate it excels in simultaneous detection of cherry tomato fruit bunches and fruit maturity, supporting automated maturity assessment and robotic harvesting in precision agriculture. Full article

► Show Figures

Figure 1

Search Results (503)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (503)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI