MDPI - Publisher of Open Access Journals

30 pages, 73820 KB

Open AccessArticle

Progressive Multi-Scale Perception Network for Non-Uniformly Blurred Underwater Image Restoration

by Dechuan Kong, Yandi Zhang, Xiaohu Zhao, Yanyan Wang and Yanqiang Wang

Sensors 2025, 25(17), 5439; https://doi.org/10.3390/s25175439 - 2 Sep 2025

Underwater imaging is affected by spatially varying blur caused by water flow turbulence, light scattering, and camera motion, resulting in severe visual quality loss and diminished performance in downstream vision tasks. Although numerous underwater image enhancement methods have been proposed, the issue of [...] Read more.

Underwater imaging is affected by spatially varying blur caused by water flow turbulence, light scattering, and camera motion, resulting in severe visual quality loss and diminished performance in downstream vision tasks. Although numerous underwater image enhancement methods have been proposed, the issue of addressing non-uniform blur under realistic underwater conditions remains largely underexplored. To bridge this gap, we propose PMSPNet, a Progressive Multi-Scale Perception Network, designed to handle underwater non-uniform blur. The network integrates a Hybrid Interaction Attention Module to enable precise modeling of feature ambiguity directions and regional disparities. In addition, a Progressive Motion-Aware Perception Branch is employed to capture spatial orientation variations in blurred regions, progressively refining the localization of blur-related features. A Progressive Feature Feedback Block is incorporated to enhance reconstruction quality by leveraging iterative feature feedback across scales. To facilitate robust evaluation, we construct the Non-uniform Underwater Blur Benchmark, which comprises diverse real-world blur patterns. Extensive experiments on multiple real-world underwater datasets demonstrate that PMSPNet consistently surpasses state-of-the-art methods, achieving on average 25.51 dB PSNR and an inference speed of 0.01 s, which provides high-quality visual perception and downstream application input from underwater sensors for underwater robots, marine ecological monitoring, and inspection tasks. Full article

(This article belongs to the Special Issue Deep Learning for Perception and Recognition: Method and Applications)

► Show Figures

Figure 1

22 pages, 8021 KB

Open AccessArticle

Multi-Task Semi-Supervised Approach for Counting Cones in Adaptive Optics Images

by Vidya Bommanapally, Amir Akhavanrezayat, Parvathi Chundi, Quan Dong Nguyen and Mahadevan Subramaniam

Algorithms 2025, 18(9), 552; https://doi.org/10.3390/a18090552 - 2 Sep 2025

Abstract

Counting and density estimation of cone cells using adaptive optics (AO) imaging plays an important role in the clinical management of retinal diseases. A novel deep learning approach for the cone counting task with minimal manual labeling of cone cells in AO images [...] Read more.

Counting and density estimation of cone cells using adaptive optics (AO) imaging plays an important role in the clinical management of retinal diseases. A novel deep learning approach for the cone counting task with minimal manual labeling of cone cells in AO images is described in this paper. We propose a hybrid multi-task semi-supervised learning (MTSSL) framework that simultaneously trains on unlabeled and labeled data. On the unlabeled images, the model learns structural and relational features by employing two self-supervised pretext tasks—image inpainting (IP) and learning-to-rank (L2R). At the same time, it leverages a small set of labeled examples to supervise a density estimation head for cone counting. By jointly minimizing the image reconstruction loss, the ranking loss, and the supervised density-map loss, our approach harnesses the rich information in unlabeled data to learn feature representations and directly incorporates ground-truth annotations to guide accurate density prediction and counts. Experiments were conducted on a dataset of AO images of 120 subjects captured using a device with a retinal camera (rtx1) with a wide field-of-view. MTSSL gains strengths from hybrid self-supervised pretext tasks of generative and predictive pretraining that aid in learning global and local context required for counting cones. The results show that the proposed MTSSL approach significantly outperforms the individual self-supervised pipelines with an RMSE score improved by a factor of 2 for cone counting. Full article

(This article belongs to the Special Issue Advanced Machine Learning Algorithms for Image Processing)

► Show Figures

Figure 1

22 pages, 1243 KB

Open AccessArticle

ProCo-NET: Progressive Strip Convolution and Frequency- Optimized Framework for Scale-Gradient-Aware Semantic Segmentation in Off-Road Scenes

by Zihang Liu, Donglin Jing and Chenxiang Ji

Symmetry 2025, 17(9), 1428; https://doi.org/10.3390/sym17091428 - 2 Sep 2025

Abstract

In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of [...] Read more.

In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of targets, causing traditional segmentation networks to face three key challenges: (1) inefficientcapture of continuous-scale features, where pyramid structures and multi-scale kernels struggle to balance computational efficiency with sufficient coverage of progressive scales; (2) degraded intra-class feature consistency, where local scale differences within targets induce semantic ambiguity; and (3) loss of high-frequency boundary information, where feature sampling operations exacerbate the blurring of progressive boundaries. To address these issues, this paper proposes the ProCo-NET framework for systematic optimization. Firstly, a Progressive Strip Convolution Group (PSCG) is designed to construct multi-level receptive field expansion through orthogonally oriented strip convolution cascading (employing symmetric processing in horizontal/vertical directions) integrated with self-attention mechanisms, enhancing perception capability for asymmetric continuous-scale variations. Secondly, an Offset-Frequency Cooperative Module (OFCM) is developed wherein a learnable offset generator dynamically adjusts sampling point distributions to enhance intra-class consistency, while a dual-channel frequency domain filter performs adaptive high-pass filtering to sharpen target boundaries. These components synergistically solve feature consistency degradation and boundary ambiguity under asymmetric changes. Experiments show that this framework significantly improves the segmentation accuracy and boundary clarity of multi-scale targets in off-road scene segmentation tasks: it achieves 71.22% MIoU on the standard RUGD dataset (0.84% higher than the existing optimal method) and 83.05% MIoU on the Freiburg_Forest dataset. Among them, the segmentation accuracy of key obstacle categories is significantly improved to 52.04% (2.7% higher than the sub-optimal model). This framework effectively compensates for the impact of asymmetric deformation through a symmetric computing mechanism. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 4678 KB

Open AccessArticle

KDiscShapeNet: A Structure-Aware Time Series Clustering Model with Supervised Contrastive Learning

by Xi Chen, Yufan Jiang, Yingming Zhang and Chunhe Song

Mathematics 2025, 13(17), 2814; https://doi.org/10.3390/math13172814 - 1 Sep 2025

Abstract

Time series clustering plays a vital role in various analytical and pattern recognition tasks by partitioning structurally similar sequences into semantically coherent groups, thereby facilitating downstream analysis. However, building high-quality clustering models remains challenging due to three key issues: (i) capturing dynamic shape [...] Read more.

Time series clustering plays a vital role in various analytical and pattern recognition tasks by partitioning structurally similar sequences into semantically coherent groups, thereby facilitating downstream analysis. However, building high-quality clustering models remains challenging due to three key issues: (i) capturing dynamic shape variations across sequences, (ii) ensuring discriminative cluster structures, and (iii) enabling end-to-end optimization. To address these challenges, we propose KDiscShapeNet, a structure-aware clustering framework that systematically extends the classical k-Shape model. First, to enhance temporal structure modeling, we adopt Kolmogorov–Arnold Networks (KAN) as the encoder, which leverages high-order functional representations to effectively capture elastic distortions and multi-scale shape features of time series. Second, to improve intra-cluster compactness and inter-cluster separability, we incorporate a dual-loss constraint by combining Center Loss and Supervised Contrastive Loss, thus enhancing the discriminative structure of the embedding space. Third, to overcome the non-differentiability of traditional K-Shape clustering, we introduce Differentiable k-Shape, embedding the normalized cross-correlation (NCC) metric into a differentiable framework that enables joint training of the encoder and the clustering module. We evaluate KDiscShapeNet on nine benchmark datasets from the UCR Archive and the ETT suite, spanning healthcare, industrial monitoring, energy forecasting, and astronomy. On the Trace dataset, it achieves an ARI of 0.916, NMI of 0.927, and Silhouette score of 0.931; on the large-scale ETTh1 dataset, it improves ARI by 5.8% and NMI by 17.4% over the best baseline. Statistical tests confirm the significance of these improvements (p < 0.01). Overall, the results highlight the robustness and practical utility of KDiscShapeNet, offering a novel and interpretable framework for time series clustering. Full article

► Show Figures

Figure 1

21 pages, 2676 KB

Open AccessArticle

DT-HRL: Mastering Long-Sequence Manipulation with Reimagined Hierarchical Reinforcement Learning

by Junyang Zhang, Yilin Zhang, Honglin Sun, Yifei Zhang and Kenji Hashimoto

Biomimetics 2025, 10(9), 577; https://doi.org/10.3390/biomimetics10090577 - 1 Sep 2025

Abstract

Robotic manipulators in warehousing and logistics often face complex tasks that involve multiple steps, frequent task switching, and long-term dependencies. Inspired by the hierarchical structure of human motor control, this paper proposes a Hierarchical Reinforcement Learning (HRL) framework utilizing a multi-task goal-conditioned Decision [...] Read more.

Robotic manipulators in warehousing and logistics often face complex tasks that involve multiple steps, frequent task switching, and long-term dependencies. Inspired by the hierarchical structure of human motor control, this paper proposes a Hierarchical Reinforcement Learning (HRL) framework utilizing a multi-task goal-conditioned Decision Transformer (MTGC-DT). The high-level policy treats the Markov decision process as a sequence modeling task, allowing the agent to manage temporal dependencies. The low-level policy is made up of parameterized action primitives that handle physical execution. This design improves long-term reasoning and generalization. This method is evaluated on two common logistics manipulation tasks: sequential stacking and spatial sorting with sparse reward and low-quality dataset. The main contributions include introducing a HRL framework that integrates Decision Transformer (DT) with task and goal embeddings, along with a path-efficiency loss (PEL) correction and designing a parameterized, learnable primitive skill library for low-level control to enhance generalization and reusability. Experimental results demonstrate that the proposed Decision Transformer-based Hierarchical Reinforcement Learning (DT-HRL) achieves over a 10% higher success rate and over 8% average reward compared with the baseline, and a normalized score increase of over 2% in the ablation experiments. Full article

(This article belongs to the Section Locomotion and Bioinspired Robotics)

► Show Figures

Figure 1

22 pages, 3866 KB

Open AccessArticle

Development of a BIM-Based Metaverse Virtual World for Collaborative Architectural Design

by David Stephen Panya, Taehoon Kim, Soon Min Hong and Seungyeon Choo

Architecture 2025, 5(3), 71; https://doi.org/10.3390/architecture5030071 - 1 Sep 2025

Abstract

The rapid evolution of the metaverse is driving the development of new digital design tools that integrate Computer-Aided Design (CAD) and Building Information Modeling (BIM) technologies. Core technologies such as Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR) are increasingly combined [...] Read more.

The rapid evolution of the metaverse is driving the development of new digital design tools that integrate Computer-Aided Design (CAD) and Building Information Modeling (BIM) technologies. Core technologies such as Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR) are increasingly combined with BIM to enhance collaboration and innovation in design and construction workflows. However, current BIM–VR integration often remains limited to isolated tasks, lacking persistent, multi-user environments that support continuous project collaboration. This study proposes a BIM-based Virtual World (VW) framework that addresses these limitations by creating an immersive, real-time collaborative platform for the Architecture, Engineering, and Construction (AEC) industry. The system enables multi-user access to BIM data through avatars, supports direct interaction with 3D models and associated metadata, and maintains a persistent virtual environment that evolves alongside project development. Key functionalities include interactive design controls, real-time decision-making support, and integrated training capabilities. A prototype was developed using Unreal Engine and supporting technologies to validate the approach. The results demonstrate improved interdisciplinary collaboration, reduced information loss during design iteration, and enhanced stakeholder engagement. This research highlights the potential of BIM-based Virtual Worlds to transform AEC collaboration by fostering an open, scalable ecosystem that bridges immersive environments with data-driven design and construction processes. Full article

(This article belongs to the Special Issue Architecture in the Digital Age)

► Show Figures

Figure 1

16 pages, 22201 KB

Open AccessArticle

MECO: Mixture-of-Expert Codebooks for Multiple Dense Prediction Tasks

by Gyutae Hwang and Sang Jun Lee

Sensors 2025, 25(17), 5387; https://doi.org/10.3390/s25175387 - 1 Sep 2025

Abstract

Autonomous systems operating in embedded environments require robust scene understanding under computational constraints. Multi-task learning offers a compact alternative to deploying multiple task-specific models by jointly solving dense prediction tasks. However, recent MTL models often suffer from entangled shared feature representations and significant [...] Read more.

Autonomous systems operating in embedded environments require robust scene understanding under computational constraints. Multi-task learning offers a compact alternative to deploying multiple task-specific models by jointly solving dense prediction tasks. However, recent MTL models often suffer from entangled shared feature representations and significant computational overhead. To address these limitations, we propose Mixture-of-Expert Codebooks (MECO), a novel multi-task learning framework that leverages vector quantization to design Mixture-of-Experts with lightweight codebooks. MECO disentangles task-generic and task-specific representations and enables efficient learning across multiple dense prediction tasks such as semantic segmentation and monocular depth estimation. The proposed multi-task learning model is trained end-to-end using a composite loss that combines task-specific objectives and vector quantization losses. We evaluate MECO on a real-world driving dataset collected in challenging embedded scenarios. MECO achieves a +0.4% mIoU improvement in semantic segmentation and maintains comparable depth estimation accuracy to the baseline, while reducing model parameters and FLOPs by 18.33% and 28.83%, respectively. These results demonstrate the potential of vector quantization-based Mixture-of-Experts modeling for efficient and scalable multi-task learning in embedded environments. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Artificial Intelligence Technologies for Industrial Robotics)

► Show Figures

Figure 1

18 pages, 2884 KB

Open AccessArticle

Research on Multi-Path Feature Fusion Manchu Recognition Based on Swin Transformer

by Yu Zhou, Mingyan Li, Hang Yu, Jinchi Yu, Mingchen Sun and Dadong Wang

Symmetry 2025, 17(9), 1408; https://doi.org/10.3390/sym17091408 - 29 Aug 2025

Viewed by 146

Abstract

Recognizing Manchu words can be challenging due to their complex character variations, subtle differences between similar characters, and homographic polysemy. Most studies rely on character segmentation techniques for character recognition or use convolutional neural networks (CNNs) to encode word images for word recognition. [...] Read more.

Recognizing Manchu words can be challenging due to their complex character variations, subtle differences between similar characters, and homographic polysemy. Most studies rely on character segmentation techniques for character recognition or use convolutional neural networks (CNNs) to encode word images for word recognition. However, these methods can lead to segmentation errors or a loss of semantic information, which reduces the accuracy of word recognition. To address the limitations in the long-range dependency modeling of CNNs and enhance semantic coherence, we propose a hybrid architecture to fuse the spatial features of original images and spectral features. Specifically, we first leverage the Short-Time Fourier Transform (STFT) to preprocess the raw input images and thereby obtain their multi-view spectral features. Then, we leverage a primary CNN block and a pair of symmetric CNN blocks to construct a symmetric spectral enhancement module, which is used to encode the raw input features and the multi-view spectral features. Subsequently, we design a feature fusion module via Swin Transformer to fuse multi-view spectral embedding and thereby concat it with the raw input embedding. Finally, we leverage a Transformer decoder to obtain the target output. We conducted extensive experiments on Manchu words benchmark datasets to evaluate the effectiveness of our proposed framework. The experimental results demonstrated that our framework performs robustly in word recognition tasks and exhibits excellent generalization capabilities. Additionally, our model outperformed other baseline methods in multiple writing-style font-recognition tasks. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

47 pages, 1148 KB

Open AccessReview

Burnout and the Brain—A Mechanistic Review of Magnetic Resonance Imaging (MRI) Studies

by James Chmiel and Donata Kurpas

Int. J. Mol. Sci. 2025, 26(17), 8379; https://doi.org/10.3390/ijms26178379 - 28 Aug 2025

Viewed by 494

Abstract

Occupational burnout is ubiquitous yet still debated as a disease entity. Previous reviews surveyed multiple biomarkers but left their neural substrate unclear. We therefore asked: What, if any, reproducible magnetic-resonance signature characterises burnout? Following PRISMA principles adapted for mechanistic synthesis, two reviewers searched [...] Read more.

Occupational burnout is ubiquitous yet still debated as a disease entity. Previous reviews surveyed multiple biomarkers but left their neural substrate unclear. We therefore asked: What, if any, reproducible magnetic-resonance signature characterises burnout? Following PRISMA principles adapted for mechanistic synthesis, two reviewers searched PubMed, Scopus, Google Scholar, ResearchGate and Cochrane from January 2000 to May 2025 using “MRI/fMRI” AND “burnout”. After duplicate removal and multi-stage screening, 17 clinical studies met predefined inclusion criteria (English language, MRI outcomes, validated burnout diagnosis). In total, ≈1365 participants were scanned, 880 with clinically significant burnout and 470 controls. Uniform Maslach Burnout Inventory thresholds defined cases; most studies matched age and sex, and all excluded primary neurological disease. Structural morphometry (8/17 studies) revealed consistent amygdala enlargement—predominantly in women—and grey-matter loss in dorsolateral/ventromedial prefrontal cortex and striatal caudate–putamen, while hippocampal volume remained unaffected, distinguishing burnout from PTSD or depression. Resting-state and task fMRI (9/17 studies) showed fronto-cortical hyper-activation, weakened amygdala–ACC coupling, and progressive fragmentation of rich-club networks, collectively indicating compensatory executive overdrive and global inefficiency. Two longitudinal cohorts and several intervention sub-studies demonstrated partial reversal of cortical thinning and limbic hyper-reactivity after mindfulness, exercise, cognitive-behavioural therapy, neurofeedback, or rTMS, underscoring plasticity. Across heterogeneous paradigms and populations, MRI converges on a coherent, sex-modulated but reversible brain-networkopathy that satisfies objective disease criteria. These findings justify early neuro-imaging-based triage, circuit-targeted therapy, and formal nosological recognition of burnout as a mental disorder, with policy ramifications for occupational health and insurance parity. Full article

(This article belongs to the Special Issue Molecular Mechanisms Underlying Vulnerability to Stress-Related Disorders)

► Show Figures

Figure 1

42 pages, 5613 KB

Open AccessArticle

YOLOv11-EMD: An Enhanced Object Detection Algorithm Assisted by Multi-Stage Transfer Learning for Industrial Steel Surface Defect Detection

by Weipeng Shi, Junlin Dai, Changhe Li and Na Niu

Mathematics 2025, 13(17), 2769; https://doi.org/10.3390/math13172769 - 28 Aug 2025

Viewed by 303

Abstract

To address the issues of inaccurate positioning, weak feature extraction capability, and poor cross-domain adaptability in the detection of surface defects of steel materials, this paper proposes an improved YOLOv11-EMD algorithm and integrates a multi-stage transfer learning framework to achieve high-precision, robust, and [...] Read more.

To address the issues of inaccurate positioning, weak feature extraction capability, and poor cross-domain adaptability in the detection of surface defects of steel materials, this paper proposes an improved YOLOv11-EMD algorithm and integrates a multi-stage transfer learning framework to achieve high-precision, robust, and low-cost industrial defect detection. Specifically, the InnerEIoU loss function is introduced to improve the accuracy of bounding box regression, the multi-scale dilated attention (MSDA) module is integrated to enhance the multi-scale feature fusion capability, and the Cross-Stage Partial Network with 3 Convolutions and Kernel size 2 Dynamic Convolution (C3k2_DynamicConv) module is embedded to improve the expression of and adaptability to complex defects. To address the problem of performance degradation when the model migrates between different data domains, a multi-stage transfer learning framework is constructed, combining source domain pre-training and target domain fine-tuning strategies to improve the model’s generalization ability in scenarios with changing data distributions. On the comprehensive dataset constructed of NEU-DET and Severstal steel defect images, YOLOv11-EMD achieved a precision of 0.942, a recall of 0.868, and an mAP@50 of 0.949, which are 3.5%, 0.8%, and 1.6% higher than the original model, respectively. On the cross-scenario mixed dataset composed of NEU-DET and GC10-DET data, the mAP@50 was 0.799, outperforming mainstream detection algorithms. The multi-stage transfer strategy can shorten the training time by 3.2% and increase the mAP by 8.8% while maintaining accuracy. The proposed method improves the defect detection accuracy, has good generalization and engineering application potential, and is suitable for automated quality inspection tasks in diverse industrial scenarios. Full article

► Show Figures

Figure 1

19 pages, 29645 KB

Open AccessArticle

Defect Detection in GIS X-Ray Images Based on Improved YOLOv10

by Guoliang Xu, Xiaolong Bai and Menghao Huang

Sensors 2025, 25(17), 5310; https://doi.org/10.3390/s25175310 - 26 Aug 2025

Viewed by 458

Abstract

Timely and accurate detection of internal defects in Gas-Insulated Switchgear (GIS) with X-ray imaging is critical for power system reliability. However, automated detection faces significant challenges from small, low-contrast defects and complex background structures. This paper proposes an enhanced object-detection model based on [...] Read more.

Timely and accurate detection of internal defects in Gas-Insulated Switchgear (GIS) with X-ray imaging is critical for power system reliability. However, automated detection faces significant challenges from small, low-contrast defects and complex background structures. This paper proposes an enhanced object-detection model based on the lightweight YOLOv10n framework, specifically optimized for this task. Key improvements include adopting the Normalized Wasserstein Distance (NWD) loss function for small object localization, integrating Monte Carlo (MCAttn) and Parallelized Patch-Aware (PPA) attention to enhance feature extraction, and designing a GFPN-inspired neck for improved multi-scale feature fusion. The model was rigorously evaluated on a custom GIS X-ray dataset. The final model achieved a mean Average Precision (mAP) of 0.674 (IoU 0.5:0.95), representing a 5.0 percentage point improvement over the YOLOv10n baseline and surpassing other comparative models. Qualitative results also confirmed the model’s enhanced capability in detecting challenging small and low-contrast defects. This study presents an effective approach for automated GIS defect detection, with significant potential to enhance power grid maintenance efficiency and safety. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

24 pages, 103094 KB

Open AccessArticle

A Method for Automated Detection of Chicken Coccidia in Vaccine Environments

by Ximing Li, Qianchao Wang, Lanqi Chen, Xinqiu Wang, Mengting Zhou, Ruiqing Lin and Yubin Guo

Vet. Sci. 2025, 12(9), 812; https://doi.org/10.3390/vetsci12090812 - 26 Aug 2025

Viewed by 343

Abstract

Vaccines play a crucial role in the prevention and control of chicken coccidiosis, effectively reducing economic losses in the poultry industry and significantly improving animal welfare. To ensure the production quality and immune effect of vaccines, accurate detection of chicken Coccidia oocysts in [...] Read more.

Vaccines play a crucial role in the prevention and control of chicken coccidiosis, effectively reducing economic losses in the poultry industry and significantly improving animal welfare. To ensure the production quality and immune effect of vaccines, accurate detection of chicken Coccidia oocysts in vaccine is essential. However, this task remains challenging due to the minute size of oocysts, variable spatial orientation, and morphological similarity among species. Therefore, we propose YOLO-Cocci, a chicken coccidia detection model based on YOLOv8n, designed to improve the detection accuracy of chicken coccidia oocysts in vaccine environments. Firstly, an efficient multi-scale attention (EMA) module was added to the backbone to enhance feature extraction and enable more precise focus on oocyst regions. Secondly, we developed the inception-style multi-scale fusion pyramid network (IMFPN) as an efficient neck. By integrating richer low-level features and applying convolutional kernels of varying sizes, IMFPN effectively preserves the features of small objects and enhances feature representation, thereby improving detection accuracy. Finally, we designed a lightweight feature-reconstructed and partially decoupled detection head (LFPD-Head), which enhances detection accuracy while reducing both model parameters and computational cost. The experimental results show that YOLO-Cocci achieves an mAP@0.5 of 89.6%, an increase of 6.5% over the baseline model, while reducing the number of parameters and computation by 14% and 12%, respectively. Notably, in the detection of Eimeria necatrix, mAP@0.5 increased by 14%. In order to verify the application effect of the improved detection algorithm, we developed client software that can realize automatic detection and visualize the detection results. This study will help improve the level of automated assessment of vaccine quality and thus promote the improvement of animal welfare. Full article

(This article belongs to the Topic AI, Deep Learning, and Machine Learning in Veterinary Science Imaging)

► Show Figures

Figure 1

24 pages, 1747 KB

Open AccessArticle

HortiVQA-PP: Multitask Framework for Pest Segmentation and Visual Question Answering in Horticulture

by Zhongxu Li, Chenxi Du, Shengrong Li, Yaqi Jiang, Linwan Zhang, Changhao Ju, Fansen Yue and Min Dong

Horticulturae 2025, 11(9), 1009; https://doi.org/10.3390/horticulturae11091009 - 25 Aug 2025

Viewed by 499

Abstract

A multimodal interactive system, HortiVQA-PP, is proposed for horticultural scenarios, with the aim of achieving precise identification of pests and their natural predators, modeling ecological co-occurrence relationships, and providing intelligent question-answering services tailored to agricultural users. The system integrates three core modules: semantic [...] Read more.

A multimodal interactive system, HortiVQA-PP, is proposed for horticultural scenarios, with the aim of achieving precise identification of pests and their natural predators, modeling ecological co-occurrence relationships, and providing intelligent question-answering services tailored to agricultural users. The system integrates three core modules: semantic segmentation, pest–predator co-occurrence detection, and knowledge-enhanced visual question answering. A multimodal dataset comprising 30 pest categories and 10 predator categories has been constructed, encompassing annotated images and corresponding question–answer pairs. In the semantic segmentation task, HortiVQA-PP outperformed existing models across all five evaluation metrics, achieving a precision of 89.6%, recall of 85.2%, F1-score of 87.3%,

m A P @ 50

of 82.4%, and IoU of 75.1%, representing an average improvement of approximately 4.1% over the Segment Anything model. For the pest–predator co-occurrence matching task, the model attained a multi-label accuracy of 83.5%, a reduced Hamming Loss of 0.063, and a macro-F1 score of 79.4%, significantly surpassing methods such as ASL and ML-GCN, thereby demonstrating robust structural modeling capability. In the visual question answering task, the incorporation of a horticulture-specific knowledge graph enhanced the model’s reasoning ability. The system achieved 48.7% in BLEU-4, 54.8% in ROUGE-L, 43.3% in METEOR, 36.9% in exact match (EM), and a GPT expert score of 4.5, outperforming mainstream models including BLIP-2, Flamingo, and MiniGPT-4 across all metrics. Experimental results indicate that HortiVQA-PP exhibits strong recognition and interaction capabilities in complex pest scenarios, offering a high-precision, interpretable, and widely applicable artificial intelligence solution for digital horticulture. Full article

(This article belongs to the Special Issue Applied Artificial Intelligence in Digital Horticulture: Practices and Innovations)

► Show Figures

Figure 1

22 pages, 3881 KB

Open AccessArticle

A Novel Fish Pose Estimation Method Based on Semi-Supervised Temporal Context Network

by Yuanchang Wang, Ming Wang, Jianrong Cao, Chen Wang, Zhen Wu and He Gao

Biomimetics 2025, 10(9), 566; https://doi.org/10.3390/biomimetics10090566 - 25 Aug 2025

Viewed by 368

Abstract

Underwater biomimetic robotic fish are emerging as vital platforms for ocean exploration tasks such as environmental monitoring, biological observation, and seabed investigation, particularly in areas inaccessible to humans. Central to their effectiveness is high-precision fish pose estimation, which enables detailed analysis of swimming [...] Read more.

Underwater biomimetic robotic fish are emerging as vital platforms for ocean exploration tasks such as environmental monitoring, biological observation, and seabed investigation, particularly in areas inaccessible to humans. Central to their effectiveness is high-precision fish pose estimation, which enables detailed analysis of swimming patterns and ecological behavior, while informing the design of agile, efficient bio-inspired robots. To address the widespread scarcity of high-quality motion datasets in this domain, this study presents a custom-built dual-camera experimental platform that captures multi-view sequences of carp exhibiting three representative swimming behaviors—straight swimming, backward swimming, and turning—resulting in a richly annotated dataset. To overcome key limitations in existing pose estimation methods, including heavy reliance on labeled data and inadequate modeling of temporal dependencies, a novel Semi-supervised Temporal Context-Aware Network (STC-Net) is proposed. STC-Net incorporates two innovative unsupervised loss functions—temporal continuity loss and pose plausibility loss—to leverage both annotated and unannotated video frames, and integrates a Bi-directional Convolutional Recurrent Neural Network to model spatio-temporal correlations across adjacent frames. These enhancements are architecturally compatible and computationally efficient, preserving end-to-end trainability. Experimental results on the proposed dataset demonstrate that STC-Net achieves a keypoint detection RMSE of 9.71, providing a robust and scalable solution for biological pose estimation under complex motion scenarios. Full article

(This article belongs to the Special Issue Bionic Robotic Fish: 2nd Edition)

► Show Figures

Figure 1

14 pages, 7081 KB

Open AccessArticle

SupGAN: A General Super-Resolution GAN-Promoting Training Method

by Tao Wu, Shuo Xiong, Qiuhang Chen, Huaizheng Liu, Weijun Cao and Haoran Tuo

Appl. Sci. 2025, 15(17), 9231; https://doi.org/10.3390/app15179231 - 22 Aug 2025

Viewed by 294

Abstract

An image super-resolution (SR) method based on Generative Adversarial Networks (GANs) has achieved impressive results in terms of visual performance. However, the weights of loss functions in these methods are usually set to fixed values manually, which cannot fully adapt to different datasets [...] Read more.

An image super-resolution (SR) method based on Generative Adversarial Networks (GANs) has achieved impressive results in terms of visual performance. However, the weights of loss functions in these methods are usually set to fixed values manually, which cannot fully adapt to different datasets and tasks, and may result in a decrease in the perceptual effect of the SR images. To address this issue and further improve visual quality, we propose a perception-driven SupGAN, which improves the generator and loss function of GAN-based image super-resolution models. The generator adopts multi-scale feature extraction and fusion to restore SR images with diverse and fine textures. We design a network-training method based on the proportion of high-frequency information in images (BHFTM), which utilizes the proportion of high-frequency information in images obtained through the Canny operator to set the weights of the loss function. In addition, we employ the four-patch method to better simulate the degradation of complex real-world scenarios. We extensively test our method and compare it with recent SR methods (BSRGAN, Real-ESRGAN, RealSR, SwinIR, LDL, etc.) on different types of datasets (OST300, 2020track1, RealWorld38, BSDS100 etc.) with a scaling factor of ×4. The results show that the NIQE metric improves, and also demonstrate that SupGAN can generate more natural and fine textures while suppressing unpleasant artifacts. Full article

(This article belongs to the Special Issue Collaborative Learning and Optimization Theory and Its Applications)

► Show Figures

Figure 1

Search Results (817)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (817)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI