MDPI - Publisher of Open Access Journals

22 pages, 2186 KB

Open AccessArticle

ConvDeiT-Tiny: Adding Local Inductive Bias to DeiT-Ti for Enhanced Maize Leaf Disease Classification

by Damaris Waema, Waweru Mwangi and Petronilla Muriithi

Plants 2026, 15(6), 982; https://doi.org/10.3390/plants15060982 - 23 Mar 2026

Viewed by 357

Reliable identification of maize leaf diseases is critical for mitigating crop losses, particularly in regions where farmers have limited access to experts. Although vision transformers (ViTs) have recently demonstrated strong performance in image recognition, their weak inductive bias and limited modeling of local [...] Read more.

Reliable identification of maize leaf diseases is critical for mitigating crop losses, particularly in regions where farmers have limited access to experts. Although vision transformers (ViTs) have recently demonstrated strong performance in image recognition, their weak inductive bias and limited modeling of local texture patterns make them non-ideal for fine-grained maize leaf disease classification. To address these limitations, we propose ConvDeiT-Tiny, a lightweight hybrid ViT that improves DeiT-Ti by placing depthwise convolutions in parallel with multi-head self-attention modules in the first three transformer blocks. The local and global features captured by the convolution and attention modules are concatenated along the embedding dimension and fused using a multilayer perceptron. This results in richer token representations without significantly increasing model size. Across three datasets, ConvDeiT-Tiny (6.9 M parameters) consistently outperformed DeiT-Ti, DeiT-Ti-Distilled, and DeiT-S (21.7 M parameters) when trained from scratch. With transfer learning, ConvDeiT-Tiny achieved an accuracy of 99.15%, 99.35%, and 98.60% on the CD&S, primary, and Kaggle datasets, respectively, surpassing many previous studies with far fewer parameters. For explainability, we present gradient-weighted transformer attribution visualizations showing the disease lesions driving model predictions. These results indicate that injecting local inductive bias in early transformer blocks is beneficial for accurate maize leaf disease classification. Full article

(This article belongs to the Special Issue AI-Driven Machine Vision Technologies in Plant Science)

► Show Figures

Figure 1

20 pages, 3878 KB

Open AccessArticle

A Hybrid Multimodal Cancer Diagnostic Framework Integrating Deep Learning of Histopathology and Whispering Gallery Mode Optical Sensors

by Shereen Afifi, Amir R. Ali, Nada Haytham Abdelbasset, Youssef Poulis, Yasmin Yousry, Mohamed Zinal, Hatem S. Abdullah, Miral Y. Selim and Mohamed Hamed

Diagnostics 2026, 16(6), 848; https://doi.org/10.3390/diagnostics16060848 - 12 Mar 2026

Viewed by 479

Abstract

Background/Objectives: Biopsy examination remains the gold standard for cancer diagnosis, relying on histopathological assessment of tissue samples to identify malignant changes. However, manual interpretation of histopathological slides is time-consuming, subjective, and susceptible to inter-observer variability. The digitization of histopathological images enables automated analysis [...] Read more.

Background/Objectives: Biopsy examination remains the gold standard for cancer diagnosis, relying on histopathological assessment of tissue samples to identify malignant changes. However, manual interpretation of histopathological slides is time-consuming, subjective, and susceptible to inter-observer variability. The digitization of histopathological images enables automated analysis and offers opportunities to support clinicians with more consistent and objective diagnostic tools. This study aims to enhance cancer diagnosis by proposing a hybrid framework that integrates deep-learning-based histopathological image analysis with Whispering Gallery Mode (WGM) optical sensing for complementary tissue characterization. Methods: The proposed framework combines automated tumor classification from histopathological images with biochemical signal analysis obtained from WGM optical sensors. Deep learning models, including EfficientNet-B0, InceptionV3, and Vision Transformer (ViT), were employed for binary and multi-class tumor classification using the BreakHis dataset. To address class imbalance, a Deep Convolutional Generative Adversarial Network (DCGAN) was utilized to generate synthetic histopathological images alongside conventional data augmentation techniques. In parallel, WGM optical sensors were incorporated to capture subtle tissue-specific signatures, with machine learning algorithms enabling automated feature extraction and classification of the acquired signals. Results: In multi-class classification, InceptionV3 combined with DCGAN-based augmentation achieved an accuracy of 94.45%, while binary classification reached 96.49%. Fine-tuned Vision Transformer models achieved a higher classification accuracy of 98% on the BreakHis dataset. The integration of WGM optical sensing provided additional biochemical information, offering complementary insights to image-based analysis and supporting more robust diagnostic decision-making. Conclusions: The proposed hybrid framework demonstrates the potential of combining deep-learning-based histopathological image analysis with WGM optical sensing to improve the accuracy and reliability of cancer classification. By integrating morphological and biochemical information, the framework offers a promising approach for enhanced, objective, and supportive cancer diagnostic systems. Full article

(This article belongs to the Special Issue AI for Precision Diagnostics: Enhancing Accuracy, Accessibility, and Outcomes)

► Show Figures

Figure 1

31 pages, 1466 KB

Open AccessArticle

Fusing Geometric and Semantic Features via Cosine Similarity Cross-Attention for Remote Sensing Scene Classification

by Xuefei Xu and Chengjun Xu

Sensors 2026, 26(5), 1613; https://doi.org/10.3390/s26051613 - 4 Mar 2026

Viewed by 320

Abstract

High-resolution remote sensing image scene classification (HRRSI-SC) is crucial for obtaining accurate Earth surface information. However, the task remains challenging due to significant background interference, high intra-class variation, and subtle inter-class similarities. Convolutional neural networks (CNNs) are constrained by their local receptive fields, [...] Read more.

High-resolution remote sensing image scene classification (HRRSI-SC) is crucial for obtaining accurate Earth surface information. However, the task remains challenging due to significant background interference, high intra-class variation, and subtle inter-class similarities. Convolutional neural networks (CNNs) are constrained by their local receptive fields, which limits their ability to capture long-range spatial dependencies. On the other hand, Vision Transformers (e.g., ViT-B-16) excel at global feature extraction but often suffer from high computational complexity and may lack the inherent inductive biases for local feature modeling that CNNs possess. To address these limitations, this paper proposes a cross-level feature complementary classification framework based on Lie Group manifold space, termed CBCAM-LGM. Within the proposed CBCAM-LGM framework, multi-granularity features are first distilled via a global average pooling layer to suppress redundant information. The core of our approach, the cross-level bidirectional complementary attention module (CBCAM), then enables the adaptive fusion of features from both branches through a cross-query attention mechanism. Furthermore, by employing parallel dilated convolutions and a parameter-sharing strategy, the model captures multi-scale contextual information by sharing a single set of convolutional weights, which reduces the computational complexity to merely 1.21 GMACs while preserving multi-scale representation with minimal parameter overhead. Extensive experiments on challenging benchmarks demonstrate the model’s efficacy, as it achieves a state-of-the-art classification accuracy of 97.81% on the AID, surpassing the ViT-B-16 baseline by 1.63%, while containing only 11.237 million parameters (an 87% reduction). These results collectively affirm that our model presents an efficient solution characterized by high accuracy and low complexity. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

16 pages, 2599 KB

Open AccessArticle

Toward Patient-Specific Digital Twin Models of Disease Progression Using Sequential Medical Imaging and EHR Data

by Hasan Ali Eriş, Muhammed Ali Aydın and Mehmet Ali Erturk

Appl. Sci. 2026, 16(4), 2104; https://doi.org/10.3390/app16042104 - 21 Feb 2026

Viewed by 339

Abstract

Artificial intelligence (AI) is reshaping healthcare by supporting faster and more informed clinical decisions. However, the complexity of human health makes accurate predictive modeling challenging. In this study, we introduce a methodological framework for constructing intelligent digital twins of disease progression by combining [...] Read more.

Artificial intelligence (AI) is reshaping healthcare by supporting faster and more informed clinical decisions. However, the complexity of human health makes accurate predictive modeling challenging. In this study, we introduce a methodological framework for constructing intelligent digital twins of disease progression by combining patients’ sequential medical images with temporally aligned electronic health records (EHRs). EHRs in this context include structured clinical parameters such as laboratory test results, demographic characteristics, and medication information. The existing literature provides limited approaches that jointly forecast future medical images and clinical status using long-term historical data. Our framework integrates aligned temporal image sequences with these EHR features and employs either ConvLSTM or ViViT-based spatio-temporal encoders, optionally coupled with a generative module for future image synthesis. While awaiting access to patient datasets, we conducted an initial evaluation using a single-cell time-lapse microscopy dataset whose temporal dynamics resemble patient data. Both systems generate time-ordered image sequences that evolve under changing conditions, and the shifting nutrient environment in microfluidic channels parallels the temporal variations observed in patients’ EHR records. This preliminary study demonstrates the broader applicability of our model to datasets containing long-term sequential images and associated parameters, supporting its potential for future patient-specific digital twin development. Full article

► Show Figures

Figure 1

35 pages, 2479 KB

Open AccessFeature PaperArticle

Integrating Vision Transformer and Time–Frequency Analysis for Stock Volatility Prediction

by Myungjin Wooh and Poongjin Cho

Mathematics 2025, 13(23), 3787; https://doi.org/10.3390/math13233787 - 25 Nov 2025

Viewed by 3708

Abstract

Financial market volatility prediction remains challenging due to data nonlinearity and non-stationarity. Existing quantitative approaches struggle to capture multi-scale information embedded in time series, while convolutional neural network (CNN)-based image approaches primarily emphasize local feature extraction, whereas Vision Transformers (ViTs) more directly capture [...] Read more.

Financial market volatility prediction remains challenging due to data nonlinearity and non-stationarity. Existing quantitative approaches struggle to capture multi-scale information embedded in time series, while convolutional neural network (CNN)-based image approaches primarily emphasize local feature extraction, whereas Vision Transformers (ViTs) more directly capture global dependencies through self-attention. To address these limitations, we propose TF-ViTNet, a dual-path hybrid model that integrates time–frequency scalogram generated via Continuous Wavelet Transform (CWT) with ViTs for volatility prediction. While time–frequency analysis has been widely adopted in prior studies, the application of ViTs to CWT-based scalograms within parallel architecture provides a new perspective for capturing global spatiotemporal structures in financial volatility. The model employs a parallel architecture where a Vision Transformer pathway learns global spatiotemporal patterns from scalograms while a Long Short-Term Memory (LSTM) pathway captures temporal characteristics from technical indicators, with both streams integrated at the final stage for volatility prediction. Empirical analysis using NASDAQ and S&P 500 index data from 2010 to 2024 demonstrates that TF-ViTNet consistently outperforms LSTM models using numerical data alone and existing benchmarks. In parallel architectures, Vision Transformers capture global patterns in scalograms more effectively than CNNs, achieving significant performance improvements, particularly for NASDAQ. The model maintains stable predictive power even during high volatility regimes, demonstrating strong potential as a risk management tool. Data augmentation improves performance for the stable S&P 500 market but degrades results for the volatile NASDAQ market, emphasizing the need for market-specific augmentation strategies tailored to underlying signal-to-noise characteristics. Full article

(This article belongs to the Special Issue Advances in Machine Learning Applied to Financial Economics)

► Show Figures

Figure 1

29 pages, 7050 KB

Open AccessArticle

Mechanical Fault Diagnosis Method of Disconnector Based on Parallel Dual-Channel Model of Feature Fusion

by Chi Zhang, Hongzhong Ma and Tianyu Hu

Sensors 2025, 25(22), 6933; https://doi.org/10.3390/s25226933 - 13 Nov 2025

Cited by 1 | Viewed by 609

Abstract

Mechanical fault samples of disconnectors are scarce, the fault types vary, and the self-evidence is weak, which leads to a lack of perfect fault diagnosis methods, and hidden defects cannot be found in time. To solve this problem, a mechanical fault diagnosis method [...] Read more.

Mechanical fault samples of disconnectors are scarce, the fault types vary, and the self-evidence is weak, which leads to a lack of perfect fault diagnosis methods, and hidden defects cannot be found in time. To solve this problem, a mechanical fault diagnosis method for disconnectors based on a parallel dual-channel feature fusion model is proposed. Firstly, the optimal parameters for variational mode decomposition (VMD) are obtained using the black-winged kite algorithm (BKA). After the signal decomposition, the kurtosis values of each intrinsic mode function (IMF) are calculated, screened, and reconstructed. The reconstructed signal is input into the gated recurrent unit (GRU) to capture its time-series characteristics. Then, the vibration signal is generated by the recurrence plot (RP) to generate the atlas set and input into the vision Transformer (ViT) to extract its spatial characteristics. Finally, the time-series and spatial characteristics are fused, the multi-head self-attention mechanism is used for training, and softmax is used for fault classification. The measured data results show that the diagnostic accuracy of the model for mechanical fault types reaches 97.9%, which is 3.2%, 4.3%, 1.0%, 2.4%, 2.9%, 1.8%, 2.1%, 9%, and 7.5% higher than the other nine models numbered #2–#10, respectively, verifying its effectiveness and adaptability. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

22 pages, 3753 KB

Open AccessArticle

A High-Precision Hybrid Floating-Point Compute-in-Memory Architecture for Complex Deep Learning

by Zizhao Ma, Chunshan Wang, Qi Chen, Yifan Wang and Yufeng Xie

Electronics 2025, 14(22), 4414; https://doi.org/10.3390/electronics14224414 - 13 Nov 2025

Viewed by 1758

Abstract

As artificial intelligence (AI) advances, deep learning models are shifting from convolutional architectures to transformer-based structures, highlighting the importance of accurate floating-point (FP) calculations. Compute-in-memory (CIM) enhances matrix multiplication performance by breaking down the von Neumann architecture. However, many FPCIMs struggle to maintain [...] Read more.

As artificial intelligence (AI) advances, deep learning models are shifting from convolutional architectures to transformer-based structures, highlighting the importance of accurate floating-point (FP) calculations. Compute-in-memory (CIM) enhances matrix multiplication performance by breaking down the von Neumann architecture. However, many FPCIMs struggle to maintain high precision while achieving efficiency. This work proposes a high-precision hybrid floating-point compute-in-memory (Hy-FPCIM) architecture for Vision Transformer (ViT) through post-alignment with two different CIM macros: Bit-wise Exponent Macro (BEM) and Booth Mantissa Macro (BMM). The high-parallelism BEM efficiently implements exponent calculations in-memory with the Bit-Separated Exponent Summation Unit (BSESU) and the routing-efficient Bit-wise Max Finder (BMF). The high-precision BMM achieves nearly lossless mantissa computation in-memory with efficient Booth 4 encoding and the sensitivity-amplifier-free Flying Mantissa Lookup Table based on 12T Triple Port SRAM. The proposed Hy-FPCIM architecture achieves 23.7 TFLOPS/W energy efficiency and 0.754 TFLOPS/mm² area efficiency, with 617 Kb/mm² memory density in 28 nm technology. With almost lossless architectures, the proposed Hy-FPCIM achieves an accuracy of 81.04% in recognition tasks on the ImageNet dataset using ViT, representing a 0.03% decrease compared to the software baseline. This research presents significant advantages in both accuracy and energy efficiency, providing critical technology for complex deep learning applications. Full article

(This article belongs to the Special Issue Emerging Computing Paradigms for Efficient Edge AI Acceleration)

► Show Figures

Figure 1

28 pages, 2594 KB

Open AccessArticle

Comparative Evaluation of Parallel and Sequential Hybrid CNN–ViT Models for Wrist X-Ray Anomaly Detection

by Brian Mahlatse Malau and Micheal O. Olusanya

Appl. Sci. 2025, 15(22), 11865; https://doi.org/10.3390/app152211865 - 7 Nov 2025

Cited by 1 | Viewed by 1069

Abstract

Medical anomaly detection is challenged by limited labeled data and domain shifts, which reduce the performance and generalization of deep learning (DL) models. Hybrid convolutional neural network–Vision Transformer (CNN–ViT) architectures have shown promise, but they often rely on large datasets. Multistage transfer learning [...] Read more.

Medical anomaly detection is challenged by limited labeled data and domain shifts, which reduce the performance and generalization of deep learning (DL) models. Hybrid convolutional neural network–Vision Transformer (CNN–ViT) architectures have shown promise, but they often rely on large datasets. Multistage transfer learning (MTL) provides a practical strategy to address this limitation. In this study, we evaluated parallel hybrids, where convolutional neural network (CNN) and Vision Transformer (ViT) features are fused after independent extraction, and sequential hybrids, where CNN features are passed through the ViT for integrated processing. Models were pretrained on non-wrist musculoskeletal radiographs (MURA), fine-tuned on the MURA wrist subset, and evaluated for cross-domain generalization on an external wrist X-ray dataset from the Al-Huda Digital X-ray Laboratory. Parallel hybrids (Xception–DeiT, a data-efficient image transformer) achieved the strongest internal performance (accuracy 88%), while sequential DenseNet–ViT generalized best in zero-shot transfer. After light fine-tuning, parallel hybrids achieved near-perfect accuracy (98%) and recall (1.00). Statistical analyses showed no significant difference between the parallel and sequential models (McNemar’s test), while backbone selection played a key role in performance. The Wilcoxon test found no significant difference in recall and F1-score between image and patient-level evaluations, suggesting balanced performance across both levels. Sequential hybrids achieved up to 7× faster inference than parallel models on the MURA test set while maintaining similar GPU memory usage (3.7 GB). Both fusion strategies produced clinically meaningful saliency maps that highlighted relevant wrist regions. These findings present the first systematic comparison of CNN–ViT fusion strategies for wrist anomaly detection, clarifying trade-offs between accuracy, generalization, interpretability, and efficiency in clinical AI. Full article

► Show Figures

Figure 1

25 pages, 2714 KB

Open AccessArticle

Evaluating Municipal Solid Waste Incineration Through Determining Flame Combustion to Improve Combustion Processes for Environmental Sanitation

by Jian Tang, Xiaoxian Yang, Wei Wang and Jian Rong

Sustainability 2025, 17(19), 8872; https://doi.org/10.3390/su17198872 - 4 Oct 2025

Viewed by 870

Abstract

Municipal solid waste (MSW) refers to solid and semi-solid waste generated during human production and daily activities. The process of incinerating such waste, known as municipal solid waste incineration (MSWI), serves as a critical method for reducing waste volume and recovering resources. Automatic [...] Read more.

Municipal solid waste (MSW) refers to solid and semi-solid waste generated during human production and daily activities. The process of incinerating such waste, known as municipal solid waste incineration (MSWI), serves as a critical method for reducing waste volume and recovering resources. Automatic online recognition of flame combustion status during MSWI is a key technical approach to ensuring system stability, addressing issues such as high pollution emissions, severe equipment wear, and low operational efficiency. However, when manually selecting optimized features and hyperparameters based on empirical experience, the MSWI flame combustion state recognition model suffers from high time consumption, strong dependency on expertise, and difficulty in adaptively obtaining optimal solutions. To address these challenges, this article proposes a method for constructing a flame combustion state recognition model optimized based on reinforcement learning (RL), long short-term memory (LSTM), and parallel differential evolution (PDE) algorithms, achieving collaborative optimization of deep features and model hyperparameters. First, the feature selection and hyperparameter optimization problem of the ViT-IDFC combustion state recognition model is transformed into an encoding design and optimization problem for the PDE algorithm. Then, the mutation and selection factors of the PDE algorithm are used as modeling inputs for LSTM, which predicts the optimal hyperparameters based on PDE outputs. Next, during the PDE-based optimization of the ViT-IDFC model, a policy gradient reinforcement learning method is applied to determine the parameters of the LSTM model. Finally, the optimized combustion state recognition model is obtained by identifying the feature selection parameters and hyperparameters of the ViT-IDFC model. Test results based on an industrial image dataset demonstrate that the proposed optimization algorithm improves the recognition performance of both left and right grate recognition models, with the left grate achieving a 0.51% increase in recognition accuracy and the right grate a 0.74% increase. Full article

(This article belongs to the Section Waste and Recycling)

► Show Figures

Figure 1

15 pages, 2671 KB

Open AccessEditor’s ChoiceArticle

Mechanisms of Thermal Color Change in Brown Elbaite–Fluorelbaite Tourmaline: Insights from Trace Elements and Spectral Signatures

by Kun Li and Suwei Yue

Minerals 2025, 15(10), 1032; https://doi.org/10.3390/min15101032 - 29 Sep 2025

Cited by 2 | Viewed by 835

Abstract

This study investigates the mechanism behind the heat-induced color change (brown to yellowish green) in Mn- and Fe-rich elbaite tourmaline under reducing atmosphere at 500 °C. A combination of analytical techniques including gemological characterization, electron microprobe analysis (EMPA), laser ablation inductively coupled plasma [...] Read more.

This study investigates the mechanism behind the heat-induced color change (brown to yellowish green) in Mn- and Fe-rich elbaite tourmaline under reducing atmosphere at 500 °C. A combination of analytical techniques including gemological characterization, electron microprobe analysis (EMPA), laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS), Fourier-transform infrared spectroscopy (FTIR), Raman spectroscopy, and ultraviolet–visible (UV-Vis) spectroscopy was employed. Chemical analysis confirmed the samples as intermediate members of the elbaite–fluorelbaite series, with an average formula of ^X(Na₀.₆₆ □₀.₂₆ Ca₀.₀₈) _Σ1.₀₀^Y(Li₁.₂₉Al₁.₁₀Mn₀.₃₁ Fe²⁺₀.₁₅Ti₀.₀₁Zn₀.₀₁) _Σ2.₈₇ ^ZAl₆^T[Si₆O₁₈] (BO₃)₃^V(OH)₃.₀₀^W(OH₀.₅₁F₀.₄₉) _Σ1.₀₀, enriched in Mn (17,346–20,669 μg/g) and Fe (8396–10,750 μg/g). Heat treatment enhanced transparency and induced strong pleochroism (yellowish green parallel c-axis, brown perpendicular c-axis). UV-Vis spectroscopy identified the brown color origin in the parallel c-axis direction: absorption bands at 730 nm (Fe²⁺ d–d transition, ⁵T_2g → ⁵E_g), 540 nm (Fe²⁺→Fe³⁺ intervalence charge transfer, IVCT), and 415 nm (Fe²⁺→Ti⁴⁺ IVCT + possible Mn²⁺ contribution). Post-treatment, the 540 nm band vanished, creating a green transmission window and causing the color shift parallel the c-axis. The spectra perpendicular to the c-axis remained largely unchanged. The disappearance of the 540 nm band, attributed to the reduction of Fe³⁺ to Fe²⁺ eliminating the Fe²⁺–Fe³⁺ pair interaction required for IVCT, is the primary color change mechanism. The parallel c-axis section of the samples shows brown and yellow-green dichroism after heat treatment. A decrease in the IR intensity at 4170 cm⁻¹ indicates a reduced Fe³⁺ concentration. The weakening or disappearance of the 4721 cm⁻¹ absorption band of the infrared spectrum and the near-infrared 976 nm absorption band of the ultraviolet–visible spectrum provides diagnostic indicators for identifying heat treatment in similar brown elbaite–fluorelbaite. Full article

(This article belongs to the Special Issue Gems Under the Microscope: New Insights into Mineral Structures and Properties)

► Show Figures

Figure 1

19 pages, 2063 KB

Open AccessEditor’s ChoiceArticle

Multi-Task NoisyViT for Enhanced Fruit and Vegetable Freshness Detection and Type Classification

by Siavash Esfandiari Fard, Tonmoy Ghosh and Edward Sazonov

Sensors 2025, 25(19), 5955; https://doi.org/10.3390/s25195955 - 24 Sep 2025

Cited by 3 | Viewed by 2068

Abstract

Freshness is a critical indicator of fruit and vegetable quality, directly affecting nutrition, taste, safety, and reducing waste across supply chains. Accurate detection is essential for quality control, supporting producers during harvesting and storage, and guiding consumers in purchasing decisions. Traditional manual assessment [...] Read more.

Freshness is a critical indicator of fruit and vegetable quality, directly affecting nutrition, taste, safety, and reducing waste across supply chains. Accurate detection is essential for quality control, supporting producers during harvesting and storage, and guiding consumers in purchasing decisions. Traditional manual assessment methods remain subjective, labor-intensive, and susceptible to inconsistencies, highlighting the need for automated, efficient, and scalable solutions, such as the use of imaging sensors and Artificial Intelligence (AI). In this study, the efficacy of the Noisy Vision Transformer (NoisyViT) model was evaluated for fruit and vegetable freshness detection from images. Across five publicly available datasets, the model achieved accuracies exceeding 97% (99.85%, 97.98%, 99.01%, 99.77%, and 98.96%). To enhance generalization, these five datasets were merged into a unified dataset encompassing 44 classes of 22 distinct fruit and vegetable types, named Freshness44. The NoisyViT architecture was further expanded into a multi-task configuration featuring two parallel classification heads: one for freshness detection (binary classification) and the other for fruit and vegetable type classification (22-class classification). The multi-task NoisyViT model, fine-tuned on the Freshness44 dataset, attained outstanding accuracies of 99.60% for freshness detection and 99.86% for type classification, surpassing the single-head NoisyViT model (99.59% accuracy), conventional machine learning and CNN-based state-of-the-art methodologies. In practical terms, such a system can be deployed across supply chains, retail settings, or consumer applications to enable real-time, automated monitoring of fruit and vegetable quality. Overall, the findings underscore the effectiveness of the proposed multi-task NoisyViT model combined with the Freshness44 dataset, presenting a robust and scalable solution for the assessment of fruit and vegetable freshness. Full article

(This article belongs to the Section Sensors Development)

► Show Figures

Figure 1

25 pages, 21209 KB

Open AccessArticle

Hyperspectral Image Classification Using a Spectral-Cube Gated Harmony Network

by Nana Li, Wentao Shen and Qiuwen Zhang

Electronics 2025, 14(17), 3553; https://doi.org/10.3390/electronics14173553 - 6 Sep 2025

Cited by 1 | Viewed by 1091

Abstract

In recent years, hybrid models that integrate Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs) have achieved significant improvements in hyperspectral image classification (HSIC). Nevertheless, their complex architectures often lead to computational redundancy and inefficient feature fusion, particularly struggling to balance global modeling [...] Read more.

In recent years, hybrid models that integrate Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs) have achieved significant improvements in hyperspectral image classification (HSIC). Nevertheless, their complex architectures often lead to computational redundancy and inefficient feature fusion, particularly struggling to balance global modeling and local detail extraction in high-dimensional spectral data. To solve these issues, this paper proposes a Spectral-Cube Gated Harmony Network (SCGHN) that achieves efficient spectral–spatial joint feature modeling through a dynamic gating mechanism and hierarchical feature decoupling strategy. There are three primary innovative contributions of this paper as follows: Firstly, we design a Spectral Cooperative Parallel Convolution (SCPC) module that combines dynamic gating in the spectral dimension and spatial deformable convolution. This module adopts a dual-path parallel architecture that adaptively enhances key bands and captures local textures, thereby significantly improving feature discriminability at mixed ground object boundaries. Secondly, we propose a Dual-Gated Fusion (DGF) module that achieves cross-scale contextual complementarity through group convolution and lightweight attention, thereby enhancing hierarchical semantic representations with significantly lower computational complexity. Finally, by means of the coordinated design of 3D convolution and lightweight classification decision blocks, we construct an end-to-end lightweight framework that effectively alleviates the structural redundancy issues of traditional hybrid models. Extensive experiments on three standard hyperspectral datasets reveal that our SCGHN requires fewer parameters and exhibits lower computational complexity as compared with some existing HSIC methods. Full article

► Show Figures

Figure 1

25 pages, 1734 KB

Open AccessArticle

A Multimodal Affective Interaction Architecture Integrating BERT-Based Semantic Understanding and VITS-Based Emotional Speech Synthesis

by Yanhong Yuan, Shuangsheng Duo, Xuming Tong and Yapeng Wang

Algorithms 2025, 18(8), 513; https://doi.org/10.3390/a18080513 - 14 Aug 2025

Cited by 4 | Viewed by 2950

Abstract

Addressing the issues of coarse emotional representation, low cross-modal alignment efficiency, and insufficient real-time response capabilities in current human–computer emotional language interaction, this paper proposes an affective interaction framework integrating BERT-based semantic understanding with VITS-based speech synthesis. The framework aims to enhance the [...] Read more.

Addressing the issues of coarse emotional representation, low cross-modal alignment efficiency, and insufficient real-time response capabilities in current human–computer emotional language interaction, this paper proposes an affective interaction framework integrating BERT-based semantic understanding with VITS-based speech synthesis. The framework aims to enhance the naturalness, expressiveness, and response efficiency of human–computer emotional interaction. By introducing a modular layered design, a six-dimensional emotional space, a gated attention mechanism, and a dynamic model scheduling strategy, the system overcomes challenges such as limited emotional representation, modality misalignment, and high-latency responses. Experimental results demonstrate that the framework achieves superior performance in speech synthesis quality (MOS: 4.35), emotion recognition accuracy (91.6%), and response latency (<1.2 s), outperforming baseline models like Tacotron2 and FastSpeech2. Through model lightweighting, GPU parallel inference, and load balancing optimization, the system validates its robustness and generalizability across English and Chinese corpora in cross-linguistic tests. The modular architecture and dynamic scheduling ensure scalability and efficiency, enabling a more humanized and immersive interaction experience in typical application scenarios such as psychological companionship, intelligent education, and high-concurrency customer service. This study provides an effective technical pathway for developing the next generation of personalized and immersive affective intelligent interaction systems. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

23 pages, 7940 KB

Open AccessArticle

A Novel Iodine–Dextrin Complex Exhibits No Acute or Subacute Toxicity and Enhances Azithromycin Efficacy in an LPS-Induced Sepsis Model

by Nailya Ibragimova, Arailym Aitynova, Seitzhan Turganbay, Marina Lyu, Alexander Ilin, Karina Vassilyeva, Diana Issayeva, Tamari Gapurkhaeva, Arkadiy Krasnoshtanov, Galina Ponomareva and Amir Azembayev

Pharmaceutics 2025, 17(8), 1040; https://doi.org/10.3390/pharmaceutics17081040 - 11 Aug 2025

Viewed by 1271

Abstract

Background/Objectives: Our work was designed to study the physicochemical properties, safety profile, pharmacokinetics, and prophylactic efficacy of an original iodine–dextrin-based pharmaceutical formulation (PA), both alone and in combination with azithromycin (AZ), in a murine model of LPS-induced sepsis. Methods/Results: UV–vis and ¹H-NMR [...] Read more.

Background/Objectives: Our work was designed to study the physicochemical properties, safety profile, pharmacokinetics, and prophylactic efficacy of an original iodine–dextrin-based pharmaceutical formulation (PA), both alone and in combination with azithromycin (AZ), in a murine model of LPS-induced sepsis. Methods/Results: UV–vis and ¹H-NMR spectroscopy confirmed the formation of a stable iodine–dextrin complex, with triiodide anions stabilized by hydrogen bonding and donor–acceptor interactions. No clinical signs of acute toxicity were observed at doses up to 5000 mg/kg, and subacute administration (62.5 and 125 mg/kg) showed no adverse effects on hematological or biochemical parameters. A mild, non-pathological enlargement of thyrocytes and parallel increases in TSH, T3, and T4 levels were observed at 125 mg/kg, consistent with physiological adaptation to iodine. Pharmacokinetic analysis revealed high oral bioavailability (~92%), prolonged half-life (~21 h), and wide tissue distribution with low clearance. In the sepsis model, pretreatment with AZ+PA alleviated clinical symptoms, maintained body weight, and significantly improved hematological parameters, reducing WBCs and CRP levels. The combination also decreased plasma IL-6 and TNF-α concentrations more effectively than either agent alone, indicating a synergistic anti-inflammatory effect. Histological analysis confirmed that PA, particularly in combination with AZ, mitigated LPS-induced tissue injury in the liver, kidney, and lungs. Conclusions: These findings suggest that PA is a safe, bioavailable compound with immunomodulatory properties that enhance azithromycin’s protective effects during systemic inflammation. This supports its potential use as a prophylactic agent in clinical settings, such as preoperative immune modulation to prevent sepsis-related complications. Full article

(This article belongs to the Section Biopharmaceutics)

► Show Figures

Figure 1

21 pages, 3746 KB

Open AccessArticle

DCP: Learning Accelerator Dataflow for Neural Networks via Propagation

by Peng Xu, Wenqi Shao and Ping Luo

Electronics 2025, 14(15), 3085; https://doi.org/10.3390/electronics14153085 - 1 Aug 2025

Cited by 1 | Viewed by 1661

Abstract

Deep neural network (DNN) hardware (HW) accelerators have achieved great success in improving DNNs’ performance and efficiency. One key reason is the dataflow in executing a DNN layer, including on-chip data partitioning, computation parallelism, and scheduling policy, which have large impacts on latency [...] Read more.

Deep neural network (DNN) hardware (HW) accelerators have achieved great success in improving DNNs’ performance and efficiency. One key reason is the dataflow in executing a DNN layer, including on-chip data partitioning, computation parallelism, and scheduling policy, which have large impacts on latency and energy consumption. Unlike prior works that required considerable efforts from HW engineers to design suitable dataflows for different DNNs, this work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. It has several attractive benefits that prior studies lack, including the following: (i) We translate the HW dataflow configuration into a code representation in a unified dataflow coding space, which can be optimized by back-propagating gradients given a DNN layer or network. (ii) DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives, e.g., latency and energy. (iii) It can be easily generalized to unseen HW configurations in a zero-shot or few-shot learning manner. For example, without using additional training data, Extensive experiments on several representative models such as MobileNet, ResNet, and ViT show that DCP outperforms its counterparts in various settings. Full article

(This article belongs to the Special Issue Applied Machine Learning in Data Science)

► Show Figures

Figure 1

Search Results (60)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (60)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI