Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (108)

Search Parameters:
Keywords = FashionMNIST

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1147 KB  
Article
AI-Based Steganography Method to Enhance the Information Security of Hidden Messages in Digital Images
by Nhi Do Ngoc Huynh, Jiajun Jiang, Chung-Hao Chen and Wen-Chao Yang
Electronics 2025, 14(22), 4490; https://doi.org/10.3390/electronics14224490 - 17 Nov 2025
Abstract
With the increasing sophistication of Artificial Intelligence (AI), traditional digital steganography methods face a growing risk of being detected and compromised. Adversarial attacks, in particular, pose a significant threat to the security and robustness of hidden information. To address these challenges, this paper [...] Read more.
With the increasing sophistication of Artificial Intelligence (AI), traditional digital steganography methods face a growing risk of being detected and compromised. Adversarial attacks, in particular, pose a significant threat to the security and robustness of hidden information. To address these challenges, this paper proposes a novel AI-based steganography framework designed to enhance the security of concealed messages within digital images. Our approach introduces a multi-stage embedding process that utilizes a sequence of encoder models, including a base encoder, a residual encoder, and a dense encoder, to create a more complex and secure hiding environment. To further improve robustness, we integrate Wavelet Transforms with various deep learning architectures, namely Convolutional Neural Networks (CNNs), Bayesian Neural Networks (BNNs), and Graph Convolutional Networks (GCNs). We conducted a comprehensive set of experiments on the FashionMNIST and MNIST datasets to evaluate our framework’s performance against several adversarial attacks. The results demonstrate that our multi-stage approach significantly enhances resilience. Notably, while CNN architectures provide the highest baseline accuracy, BNNs exhibit superior intrinsic robustness against gradient-based attacks. For instance, under the Fast Gradient Sign Method (FGSM) attack on the MNIST dataset, our BNN-based models maintained an accuracy of over 98%, whereas the performance of comparable CNN models dropped sharply to between 10% and 18%. This research provides a robust and effective method for developing next-generation secure steganography systems. Full article
Show Figures

Figure 1

29 pages, 3642 KB  
Article
Securing IoT Vision Systems: An Unsupervised Framework for Adversarial Example Detection Integrating Spatial Prototypes and Multidimensional Statistics
by Naile Wang, Jian Li, Chunhui Zhang and Dejun Zhang
Sensors 2025, 25(21), 6658; https://doi.org/10.3390/s25216658 - 1 Nov 2025
Viewed by 293
Abstract
The deployment of deep learning models in Internet of Things (IoT) systems is increasingly threatened by adversarial attacks. To address the challenge of effectively detecting adversarial examples generated by Generative Adversarial Networks (AdvGANs), this paper proposes an unsupervised detection method that integrates spatial [...] Read more.
The deployment of deep learning models in Internet of Things (IoT) systems is increasingly threatened by adversarial attacks. To address the challenge of effectively detecting adversarial examples generated by Generative Adversarial Networks (AdvGANs), this paper proposes an unsupervised detection method that integrates spatial statistical features and multidimensional distribution characteristics. First, a collection of adversarial examples under four different attack intensities was constructed on the CIFAR-10 dataset. Then, based on the VGG16 and ResNet50 classification models, a dual-module collaborative architecture was designed: Module A extracted spatial statistics from convolutional layers and constructed category prototypes to calculate similarity, while Module B extracted multidimensional statistical features and characterized distribution anomalies using the Mahalanobis distance. Experimental results showed that the proposed method achieved a maximum AUROC of 0.9937 for detecting AdvGAN attacks on ResNet50 and 0.9753 on VGG16. Furthermore, it achieved AUROC scores exceeding 0.95 against traditional attacks such as FGSM and PGD, demonstrating its cross-attack generalization capability. Cross-dataset evaluation on Fashion-MNIST confirms its robust generalization across data domains. This study presents an effective solution for unsupervised adversarial example detection, without requiring adversarial samples for training, making it suitable for a wide range of attack scenarios. These findings highlight the potential of the proposed method for enhancing the robustness of IoT systems in security-critical applications. Full article
(This article belongs to the Special Issue IoT Network Security (Second Edition))
Show Figures

Figure 1

21 pages, 2519 KB  
Article
Efficient Lightweight Image Classification via Coordinate Attention and Channel Pruning for Resource-Constrained Systems
by Yao-Liang Chung
Future Internet 2025, 17(11), 489; https://doi.org/10.3390/fi17110489 - 25 Oct 2025
Viewed by 439
Abstract
Image classification is central to computer vision, supporting applications from autonomous driving to medical imaging, yet state-of-the-art convolutional neural networks remain constrained by heavy floating-point operations (FLOPs) and parameter counts on edge devices. To address this accuracy–efficiency trade-off, we propose a unified lightweight [...] Read more.
Image classification is central to computer vision, supporting applications from autonomous driving to medical imaging, yet state-of-the-art convolutional neural networks remain constrained by heavy floating-point operations (FLOPs) and parameter counts on edge devices. To address this accuracy–efficiency trade-off, we propose a unified lightweight framework built on a pruning-aware coordinate attention block (PACB). PACB integrates coordinate attention (CA) with L1-regularized channel pruning, enriching feature representation while enabling structured compression. Applied to MobileNetV3 and RepVGG, the framework achieves substantial efficiency gains. On GTSRB, MobileNetV3 parameters drop from 16.239 M to 9.871 M (–6.37 M) and FLOPs from 11.297 M to 8.552 M (–24.3%), with accuracy improving from 97.09% to 97.37%. For RepVGG, parameters fall from 7.683 M to 7.093 M (–0.59 M) and FLOPs from 31.264 M to 27.918 M (–3.35 M), with only ~0.51% average accuracy loss across CIFAR-10, Fashion-MNIST, and GTSRB. Complexity analysis further confirms PACB does not increase asymptotic order, since the additional CA operations contribute only lightweight lower-order terms. These results demonstrate that coupling CA with structured pruning yields a scalable accuracy–efficiency trade-off under hardware-agnostic metrics, making PACB a promising, deployment-ready solution for mobile and edge applications. Full article
(This article belongs to the Special Issue Clustered Federated Learning for Networks)
Show Figures

Figure 1

36 pages, 6685 KB  
Article
From Predictive Coding to EBPM: A Novel DIME Integrative Model for Recognition and Cognition
by Ionel Cristian Vladu, Nicu George Bîzdoacă, Ionica Pirici and Bogdan Cătălin
Appl. Sci. 2025, 15(20), 10904; https://doi.org/10.3390/app152010904 - 10 Oct 2025
Viewed by 668
Abstract
Predictive Coding (PC) frameworks claim to model recognition via prediction–error loops, but they often lack explicit biological implementation of fast familiar recognition and impose latency that limits real-time robotic control. We begin with Experience-Based Pattern Matching (EBPM), a biologically grounded mechanism inspired [...] Read more.
Predictive Coding (PC) frameworks claim to model recognition via prediction–error loops, but they often lack explicit biological implementation of fast familiar recognition and impose latency that limits real-time robotic control. We begin with Experience-Based Pattern Matching (EBPM), a biologically grounded mechanism inspired by neural engram reactivation, enabling near-instantaneous recognition of familiar stimuli without iterative inference. Building upon this, we propose Dynamic Integrative Matching and Encoding (DIME), a hybrid system that relies on EBPM under familiar and low-uncertainty conditions and dynamically engages PC when confronted with novelty or high uncertainty. We evaluate EBPM, PC, and DIME across multiple image datasets (MNIST, Fashion-MNIST, CIFAR-10) and on a robotic obstacle-course simulation. Results from multi-seed experiments with ablation and complexity analyses show that EBPM achieves minimal latency (e.g., ~0.03 ms/ex in MNIST, ~0.026 ms/step in robotics) but poor performance in novel or noisy cases; PC exhibits robustness at a high cost; DIME delivers strong trade-offs—boosted accuracy in familiar clean situations (+4–5% over EBPM on CIFAR-10), while cutting PC invocations by ~50% relative to pure PC. Our contributions: (i) formalizing EBPM as a neurocomputational algorithm built from biologically plausible principles, (ii) developing DIME as a dynamic EBPM–PC integrator, (iii) providing ablation and complexity analyses illuminating component roles, and (iv) offering empirical validation in both perceptual and embodied robotic scenarios—paving the way for low-latency recognition systems. Full article
(This article belongs to the Section Robotics and Automation)
Show Figures

Figure 1

24 pages, 1699 KB  
Article
Efficient Sparse MLPs Through Motif-Level Optimization Under Resource Constraints
by Xiaotian Chen, Hongyun Liu and Seyed Sahand Mohammadi Ziabari
AI 2025, 6(10), 266; https://doi.org/10.3390/ai6100266 - 9 Oct 2025
Viewed by 723
Abstract
We study motif-based optimization for sparse multilayer perceptrons (MLPs), where weights are shared and updated at the level of small neuron groups (‘motifs’) rather than individual connections. Building on Sparse Evolutionary Training (SET), our approach reduces the number of unique parameters and redundant [...] Read more.
We study motif-based optimization for sparse multilayer perceptrons (MLPs), where weights are shared and updated at the level of small neuron groups (‘motifs’) rather than individual connections. Building on Sparse Evolutionary Training (SET), our approach reduces the number of unique parameters and redundant multiply–accumulate operations by exploiting block-structured sparsity. Across Fashion-MNIST and a lung X-ray dataset, our Motif-SET improves training/inference efficiency with modest accuracy trade-offs, and we provide a principled recipe to choose motif size based on accuracy and efficiency budgets. We further compare against representative modern sparse training and compression methods, analyze failure modes such as overly large motifs, and outline real-world constraints on mobile/embedded targets. Our results and ablations indicate that motif size m=2 often offers a strong balance between compute and accuracy under resource constraints. Full article
Show Figures

Figure 1

24 pages, 4755 KB  
Article
Transfer Entropy and O-Information to Detect Grokking in Tensor Network Multi-Class Classification Problems
by Domenico Pomarico, Roberto Cilli, Alfonso Monaco, Loredana Bellantuono, Marianna La Rocca, Tommaso Maggipinto, Giuseppe Magnifico, Marlis Ontivero Ortega, Ester Pantaleo, Sabina Tangaro, Sebastiano Stramaglia, Roberto Bellotti and Nicola Amoroso
Technologies 2025, 13(10), 438; https://doi.org/10.3390/technologies13100438 - 29 Sep 2025
Viewed by 507
Abstract
Quantum-enhanced machine learning, encompassing both quantum algorithms and quantum-inspired classical methods such as tensor networks, offers promising tools for extracting structure from complex, high-dimensional data. In this work, we study the training dynamics of Matrix Product State (MPS) classifiers applied to three-class problems, [...] Read more.
Quantum-enhanced machine learning, encompassing both quantum algorithms and quantum-inspired classical methods such as tensor networks, offers promising tools for extracting structure from complex, high-dimensional data. In this work, we study the training dynamics of Matrix Product State (MPS) classifiers applied to three-class problems, using both fashion MNIST and hyperspectral satellite imagery as representative datasets. We investigate the phenomenon of grokking, where generalization emerges suddenly after memorization, by tracking entanglement entropy, local magnetization, and model performance across training sweeps. Additionally, we employ information-theory tools to gain deeper insights: transfer entropy is used to reveal causal dependencies between label-specific quantum masks, while O-information captures the shift from synergistic to redundant correlations among class outputs. Our results show that grokking in the fashion MNIST task coincides with a sharp entanglement transition and a peak in redundant information, whereas the overfitted hyperspectral model retains synergistic, disordered behavior. These findings highlight the relevance of high-order information dynamics in quantum-inspired learning and emphasize the distinct learning behaviors that emerge in multi-class classification, offering a principled framework to interpret generalization in quantum machine learning architectures. Full article
(This article belongs to the Section Quantum Technologies)
Show Figures

Figure 1

24 pages, 902 KB  
Article
Differentiable Selection of Bit-Width and Numeric Format for FPGA-Efficient Deep Networks
by Kawthar Dellel, Emanuel Trabes, Aymen Zayed, Hassene Faiedh and Carlos Valderrama
Electronics 2025, 14(18), 3715; https://doi.org/10.3390/electronics14183715 - 19 Sep 2025
Viewed by 563
Abstract
Quantization-aware training (QAT) has emerged as a key strategy for enabling efficient deep learning inference on resource-constrained platforms. Yet, most existing approaches rely on static, manually selected numeric formats—fixed-point or floating-point—and fixed bit-widths, limiting their adaptability and often requiring extensive design effort or [...] Read more.
Quantization-aware training (QAT) has emerged as a key strategy for enabling efficient deep learning inference on resource-constrained platforms. Yet, most existing approaches rely on static, manually selected numeric formats—fixed-point or floating-point—and fixed bit-widths, limiting their adaptability and often requiring extensive design effort or architecture search. In this work, we introduce a novel QAT framework that breaks this rigidity by jointly learning, during training, both the numeric representation format and the associated bit-widths in an end-to-end differentiable manner. At the core of our method lies a unified parameterization that is capable of emulating both fixed- and floating-point arithmetic, paired with a bit-aware loss function that penalizes excessive precision in a hardware-aligned fashion. We demonstrate that our approach achieves state-of-the-art trade-offs between accuracy and compression on MNIST, CIFAR-10, and CIFAR-100, reducing average bit-widths to as low as 1.4 with minimal accuracy loss. Furthermore, FPGA implementation using Xilinx FINN confirms over 5× LUT and 4× BRAM savings. This is the first QAT method to unify numeric format learning with differentiable precision control, enabling highly deployable, precision-adaptive deep neural networks. Full article
(This article belongs to the Special Issue Intelligent Embedded Systems: Latest Advances and Applications)
Show Figures

Figure 1

22 pages, 13597 KB  
Article
A Periodic Mapping Activation Function: Mathematical Properties and Application in Convolutional Neural Networks
by Xu Chen, Yinlei Cheng, Siqin Wang, Guangliang Sang, Ken Nah and Jianmin Wang
Mathematics 2025, 13(17), 2843; https://doi.org/10.3390/math13172843 - 3 Sep 2025
Cited by 2 | Viewed by 1169
Abstract
Activation functions play a crucial role in ensuring training stability, convergence speed, and overall performance in both convolutional and attention-based networks. In this study, we introduce two novel activation functions, each incorporating a sine component and a constraint term. To assess their effectiveness, [...] Read more.
Activation functions play a crucial role in ensuring training stability, convergence speed, and overall performance in both convolutional and attention-based networks. In this study, we introduce two novel activation functions, each incorporating a sine component and a constraint term. To assess their effectiveness, we replace the activation functions in four representative architectures—VGG16, ResNet50, DenseNet121, and Vision Transformers—covering a spectrum from lightweight to high-capacity models. We conduct extensive evaluations on four benchmark datasets (CIFAR-10, CIFAR-100, MNIST, and Fashion-MNIST), comparing our methods against seven widely used activation functions. The results consistently demonstrate that our proposed functions achieve superior performance across all tested models and datasets. From a design application perspective, the proposed functional periodic structure also facilitates rich and structurally stable activation visualizations, enabling designers to trace model attention, detect surface biases early, and make informed aesthetic or accessibility decisions during interface prototyping. Full article
Show Figures

Figure 1

20 pages, 1357 KB  
Article
FedPLDSE: Submodel Extraction for Federated Learning in Heterogeneous Smart City Devices
by Xiaochi Hou, Zhigang Wang, Xinhao Wang and Junfeng Zhao
Big Data Cogn. Comput. 2025, 9(9), 226; https://doi.org/10.3390/bdcc9090226 - 30 Aug 2025
Viewed by 718
Abstract
Federated learning enables collaborative model training across distributed devices while preserving data privacy. However, in real-world environments such as smart cities, heterogeneous and resource-constrained edge devices often render existing methods impractical. Low-power sensors and cameras struggle to complete full-model training, while high-performance devices [...] Read more.
Federated learning enables collaborative model training across distributed devices while preserving data privacy. However, in real-world environments such as smart cities, heterogeneous and resource-constrained edge devices often render existing methods impractical. Low-power sensors and cameras struggle to complete full-model training, while high-performance devices remain idly waiting for others. Knowledge distillation approaches rely on public datasets that are rarely available or poorly aligned with urban data, which limits their effectiveness in deployment. These limitations lead to inefficiencies, unstable convergence, and poor adaptability in diverse urban networks. Partial training alleviates some challenges by allowing clients to train submodels tailored to their capacity, but existing methods still incur high computational costs for identifying important parameters and suffer from uneven parameter updates, reducing model effectiveness. To address these challenges, we propose Parameter-Level Dynamic Submodel Extraction (PLDSE), a lightweight and adaptive framework for federated learning. PLDSE estimates parameter importance using gradient-based scores on a server-side validation set, reducing overhead while accurately identifying critical parameters. In addition, it integrates a rolling scheduling mechanism to rotate unselected parameters, ensuring full coverage and consistent model updates. Experiments on CIFAR-10, CIFAR-100, and Fashion-MNIST demonstrate superior accuracy and faster convergence, with PLDSE achieving 62.82% on CIFAR-100 under low heterogeneity and 61.51% under high heterogeneity, outperforming prior methods. Full article
Show Figures

Figure 1

23 pages, 7175 KB  
Article
Prunability of Multi-Layer Perceptrons Trained with the Forward-Forward Algorithm
by Mitko Nikov, Damjan Strnad and David Podgorelec
Mathematics 2025, 13(16), 2668; https://doi.org/10.3390/math13162668 - 19 Aug 2025
Viewed by 715
Abstract
We explore the sparsity and prunability of multi-layer perceptrons (MLPs) trained using the Forward-Forward (FF) algorithm, an alternative to backpropagation (BP) that replaces the backward pass with local, contrastive updates at each layer. We analyze the sparsity of the weight matrices during training [...] Read more.
We explore the sparsity and prunability of multi-layer perceptrons (MLPs) trained using the Forward-Forward (FF) algorithm, an alternative to backpropagation (BP) that replaces the backward pass with local, contrastive updates at each layer. We analyze the sparsity of the weight matrices during training using multiple metrics, and test the prunability of FF networks on the MNIST, FashionMNIST and CIFAR-10 datasets. We also propose FFLib—a novel, modular PyTorch-based library for developing, training and analyzing FF models along with a suite of FF-based architectures, including FFNN, FFNN+C and FFRNN. In addition to structural sparsity, we describe and apply a new method for visualizing the functional sparsity of neural activations across different architectures using the HSV color space. Moreover, we conduct a sensitivity analysis to assess the impact of hyperparameters on model performance and sparsity. Finally, we perform pruning experiments, showing that simple FF-based MLPs exhibit significantly greater robustness to one-shot neuron pruning than traditional BP-trained networks, and a possible 8-fold increase in compression ratios while maintaining comparable accuracy on the MNIST dataset. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

21 pages, 6060 KB  
Article
PFSKANs: A Novel Pixel-Level Feature Selection Model Based on Kolmogorov–Arnold Networks
by Rui Yang, Michael V. Basin, Guangzhe Yao and Hongzheng Zeng
Sensors 2025, 25(16), 4982; https://doi.org/10.3390/s25164982 - 12 Aug 2025
Viewed by 625
Abstract
Inspired by the interpretability of Kolmogorov–Arnold Networks (KANs), a novel Pixel-level Feature Selection (PFS) model based on KANs (PFSKANs) is proposed as a fundamentally distinct alternative from trainable Convolutional Neural Networks (CNNs) and transformers in the computer vision tasks. We modify the simplification [...] Read more.
Inspired by the interpretability of Kolmogorov–Arnold Networks (KANs), a novel Pixel-level Feature Selection (PFS) model based on KANs (PFSKANs) is proposed as a fundamentally distinct alternative from trainable Convolutional Neural Networks (CNNs) and transformers in the computer vision tasks. We modify the simplification techniques of KANs to detect key pixels with high contribution scores directly at the input image. Specifically, a trainable selection procedure is intuitively visualized and performed only once, since the obtained interpretable pixels can subsequently be identified and dimensionally standardized using the proposed mathematical approach. Experiments on the image classification tasks using the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets demonstrate that PFSKANs achieve comparable performance to CNNs in terms of accuracy, parameter efficiency, and training time. Full article
Show Figures

Figure 1

26 pages, 2178 KB  
Article
Testing Neural Architecture Search Efficient Evaluation Methods in DeepGA
by Jesús-Arnulfo Barradas-Palmeros, Carlos-Alberto López-Herrera, Efrén Mezura-Montes, Héctor-Gabriel Acosta-Mesa and Adriana-Laura López-Lobato
Math. Comput. Appl. 2025, 30(4), 74; https://doi.org/10.3390/mca30040074 - 17 Jul 2025
Viewed by 831
Abstract
Neural Architecture search (NAS) aims to automate the design process of Deep Neural Networks, reducing the Deep Learning (DL) expertise required and avoiding a trial-and-error process. Nonetheless, one of the main drawbacks of NAS is the high consumption of computational resources. Consequently, efficient [...] Read more.
Neural Architecture search (NAS) aims to automate the design process of Deep Neural Networks, reducing the Deep Learning (DL) expertise required and avoiding a trial-and-error process. Nonetheless, one of the main drawbacks of NAS is the high consumption of computational resources. Consequently, efficient evaluation methods (EEMs) to assess the quality of candidate architectures are an open research problem. This work tests various EEMs in the Deep Genetic Algorithm (DeepGA), including early stopping, population memory, and training-free proxies. The Fashion MNIST, CIFAR-10, and CIFAR-100 datasets were used for experimentation. The results show that population memory has a valuable impact on avoiding repeated evaluations. Additionally, early stopping achieved competitive performance while significantly reducing the computational cost of the search process. The training-free configurations using the Logsynflow and Linear Regions proxies, as well as a combination of both, were only partially competitive but dramatically reduced the search time. Finally, a comparison of the architectures and hyperparameters obtained with the different algorithm configurations is presented. The training-free search processes resulted in deeper architectures with more fully connected layers and skip connections than the ones obtained with accuracy-guided search configurations. Full article
(This article belongs to the Special Issue Feature Papers in Mathematical and Computational Applications 2025)
Show Figures

Figure 1

17 pages, 1331 KB  
Article
A Neural Network Training Method Based on Distributed PID Control
by Kun Jiang
Symmetry 2025, 17(7), 1129; https://doi.org/10.3390/sym17071129 - 14 Jul 2025
Cited by 2 | Viewed by 757
Abstract
In the previous article, we introduced a neural network framework based on symmetric differential equations. This novel framework exhibits complete symmetry, endowing it with perfect mathematical properties. While we have examined some of the system’s mathematical characteristics, a detailed discussion of the network [...] Read more.
In the previous article, we introduced a neural network framework based on symmetric differential equations. This novel framework exhibits complete symmetry, endowing it with perfect mathematical properties. While we have examined some of the system’s mathematical characteristics, a detailed discussion of the network training methodology has not yet been presented. Drawing on the principles of the traditional backpropagation algorithm, this study proposes an alternative training approach that utilizes differential equation signal propagation instead of chain rule derivation. This approach not only preserves the effectiveness of training but also offers enhanced biological interpretability. The foundation of this methodology lies in the system’s reversibility, which stems from its inherent symmetry—a key aspect of our research. However, this method alone is insufficient for effective neural network training. To address this, we further introduce a distributed Proportional–Integral–Derivative (PID) control approach, emphasizing its implementation within a closed system. By incorporating this method, we achieved both faster training speeds and improved accuracy. This approach not only offers novel insights into neural network training but also extends the scope of research into control methodologies. To validate its effectiveness, we apply this method to the MNIST (Modified National Institute of Standards and Technology database) and Fashion-MNIST, demonstrating its practical utility. Full article
Show Figures

Figure 1

36 pages, 9139 KB  
Article
On the Synergy of Optimizers and Activation Functions: A CNN Benchmarking Study
by Khuraman Aziz Sayın, Necla Kırcalı Gürsoy, Türkay Yolcu and Arif Gürsoy
Mathematics 2025, 13(13), 2088; https://doi.org/10.3390/math13132088 - 25 Jun 2025
Cited by 1 | Viewed by 1688
Abstract
In this study, we present a comparative analysis of gradient descent-based optimizers frequently used in Convolutional Neural Networks (CNNs), including SGD, mSGD, RMSprop, Adadelta, Nadam, Adamax, Adam, and the recent EVE optimizer. To explore the interaction between optimization strategies and activation functions, we [...] Read more.
In this study, we present a comparative analysis of gradient descent-based optimizers frequently used in Convolutional Neural Networks (CNNs), including SGD, mSGD, RMSprop, Adadelta, Nadam, Adamax, Adam, and the recent EVE optimizer. To explore the interaction between optimization strategies and activation functions, we systematically evaluate all combinations of these optimizers with four activation functions—ReLU, LeakyReLU, Tanh, and GELU—across three benchmark image classification datasets: CIFAR-10, Fashion-MNIST (F-MNIST), and Labeled Faces in the Wild (LFW). Each configuration was assessed using multiple evaluation metrics, including accuracy, precision, recall, F1-score, mean absolute error (MAE), and mean squared error (MSE). All experiments were performed using k-fold cross-validation to ensure statistical robustness. Additionally, two-way ANOVA was employed to validate the significance of differences across optimizer–activation combinations. This study aims to highlight the importance of jointly selecting optimizers and activation functions to enhance training dynamics and generalization in CNNs. We also consider the role of critical hyperparameters, such as learning rate and regularization methods, in influencing optimization stability. This work provides valuable insights into the optimizer–activation interplay and offers practical guidance for improving architectural and hyperparameter configurations in CNN-based deep learning models. Full article
(This article belongs to the Special Issue Artificial Intelligence and Data Science, 2nd Edition)
Show Figures

Figure 1

16 pages, 2980 KB  
Article
Enhancing Efficiency and Regularization in Convolutional Neural Networks: Strategies for Optimized Dropout
by Mehdi Ghayoumi
AI 2025, 6(6), 111; https://doi.org/10.3390/ai6060111 - 28 May 2025
Cited by 2 | Viewed by 1703
Abstract
Background/Objectives: Convolutional Neural Networks (CNNs), while effective in tasks such as image classification and language processing, often experience overfitting and inefficient training due to static, structure-agnostic regularization techniques like traditional dropout. This study aims to address these limitations by proposing a more dynamic [...] Read more.
Background/Objectives: Convolutional Neural Networks (CNNs), while effective in tasks such as image classification and language processing, often experience overfitting and inefficient training due to static, structure-agnostic regularization techniques like traditional dropout. This study aims to address these limitations by proposing a more dynamic and context-sensitive dropout strategy. Methods: We introduce Probabilistic Feature Importance Dropout (PFID), a novel regularization method that assigns dropout rates based on the probabilistic significance of individual features. PFID is integrated with adaptive, structured, and contextual dropout strategies, forming a unified framework for intelligent regularization. Results: Experimental evaluation on standard benchmark datasets including CIFAR-10, MNIST, and Fashion MNIST demonstrated that PFID significantly improves performance metrics such as classification accuracy, training loss, and computational efficiency compared to conventional dropout methods. Conclusions: PFID offers a practical and scalable solution for enhancing CNN generalization and training efficiency. Its dynamic nature and feature-aware design provide a strong foundation for future advancements in adaptive regularization for deep learning models. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

Back to TopTop