You are currently viewing a new version of our website. To view the old version click .
Symmetry
  • Article
  • Open Access

12 November 2024

Convolutional Neural Networks: A Comprehensive Evaluation and Benchmarking of Pooling Layer Variants

,
,
,
,
,
,
and
1
Department of Computer Science, National University of Technology, Islamabad 44000, Pakistan
2
Department of Computing, NASTP Institute of Information Technology, Lahore 58810, Pakistan
3
Department of Information Systems, College of Computer and Information Science, King Saud University, Riyadh 11543, Saudi Arabia
4
Department of Computing, Riphah International University, Lahore 39101, Pakistan
This article belongs to the Section Computer

Abstract

Convolutional Neural Networks (CNNs) are a class of deep neural networks that have proven highly effective in areas such as image and video recognition. CNNs typically include several types of layers, such as convolutional layers, activation layers, pooling layers, and fully connected layers, all of which contribute to the network’s ability to recognize patterns and features. The pooling layer, which often follows the convolutional layer, is crucial for reducing computational complexity by performing down-sampling while maintaining essential features. This layer’s role in balancing the symmetry of information across the network is vital for optimal performance. However, the choice of pooling method is often based on intuition, which can lead to less accurate or efficient results. This research compares various standard pooling methods (MAX and AVERAGE pooling) on standard datasets (MNIST, CIFAR-10, and CIFAR-100) to determine the most effective approach in preserving detail, performance, and overall computational efficiency while maintaining the symmetry necessary for robust CNN performance.

1. Introduction

The most sophisticated methods for performing various challenging tasks, such as image segmentation [] and classification [] in computer vision and image-analysis tasks, are convolutional neural networks (CNNs). Convolutions, nonlinear activations, and additional pooling operators are ample in each convolutional layer of a CNN, typically followed by one or more fully connected layers. CNNs are feedforward networks because the information is passed in only one direction from input to output. CNNs and artificial neural networks (ANNs) are rooted in biological principles. Their design is inspired by the brain’s visual cortex, alternately layered with simple and higher-order cells []. There are many different types of CNN architectures, but they consist of convolutional and pooling layers built into modules. Beneath these modules are one or more fully connected layers, similar to standard feedforward neural networks. Modules are typically stacked on top of each other to build complex models []. A typical CNN architecture for a toy-image-classification task is shown in Figure 1. Feeding the images directly into the network involves a series of convolution and pooling layers. The representations formed by these processes are fed into one or more fully connected layers. The classifier finally provides a proof of evaluation for the fully linked layers. Although this is the most commonly used basic design in the literature, many improvements to the architecture have recently been proposed to improve image classification accuracy or reduce computational overhead.
Figure 1. Standard CNN architecture.
Similar to feature extraction, convolutional layers learn feature representations of input images. Neurons in convolutional layers are organized into feature maps. A set of trainable weights, often called a filter bank, connects each neuron in the receptive field of the feature map with its neighbors in the underlying layer. We combine the input with the learned weights to create a new feature map and pass the output to a nonlinear activation function. Each neuron in a feature map is constrained to have equal weights since the weights of relatively different maps within the same convolutional layer are variable, so multiple features can be obtained at each point []. The k t h output feature map y k can be defined more formally as
y k = f   ( w k × x )
where x represents the input image, w k represents the convolution filter attached to the kth feature map, the 2D convolution operator is the multiplication sign in this context, and f (.) represents the nonlinear activation function. You can use this operator to compute the inner product of the filter model at each location in the input image.
Before the arrival of Convolutional Neural Networks (CNNs), traditional machine learning models such as Support Vector Machines (SVM) [] and K-Nearest Neighbors (KNN) [] were commonly employed for image classification, where each pixel was treated as an individual feature. The introduction of CNNs revolutionized this approach by using convolutional layers to extract multiple features from an image, enhancing the prediction of output values. Since the convolution operation is computationally intensive, pooling layers were integrated into CNNs to make the process more efficient. Pooling reduces the computational load by down-sampling the input, which decreases the number of computations required while preserving the most critical information. The pooling method streamlines the processing within the network, maintaining essential details with significantly lower resource consumption [].
While CNNs have significantly improved image classification, various enhancements have been proposed to further optimize performance, including hybrid models [] that combine CNNs with other machine learning algorithms to improve accuracy or reduce computational complexity. Recent advancements, such as adaptive pooling strategies and dynamic pooling methods, adjust pooling operations based on the input, allowing for better flexibility and feature retention []. However, the impact of these newer methods on standard architectures like AlexNet, ResNet, and LeNet remains an area of active research.
This study aims to provide an overview of various pooling methods, discussing the benefits and drawbacks of each approach (see Table 1). Additionally, we compare their performance in classification tasks using three distinct datasets.
Table 1. Standard pooling methods along with strengths and weaknesses.
The main contributions of this study include the following:
  • The proposed study systematically evaluated multiple CNN architectures—LeNet, AlexNet, and ResNet—across various datasets (MNIST, CIFAR-10, and CIFAR-100). This comprehensive analysis sheds light on how these models perform on datasets of differing complexities and sizes, providing insights into their adaptability and generalization capabilities across different image-classification tasks.
  • By presenting the comparative performance metrics, the proposed study identifies which CNN architectures excel or struggle when applied to specific datasets. This helps researchers to understand the strengths and weaknesses of each model in handling distinct image datasets, aiding in informed model selection for particular tasks or datasets.
  • This study provides a comprehensive comparison of standard pooling methods—max and average pooling—evaluated across different CNN architectures, including CNN, AlexNet, ResNet, and LeNet, on multiple datasets. While prior studies have discussed individual pooling methods, few have provided a systematic comparison in this context. Additionally, in this study, these methods were evaluated in light of recent advancements, highlighting the practical implications for resource-constrained environments.
The rest of this study is organized as follows: Section 2 reviews work related to standard pooling methods proposed for computer vision and various image-analysis applications. Section 3 presents the datasets and experimental procedures, reviews and presents the results and document provides a detailed discussion of our study, and Section 4 summarizes the study.

3. Material and Methods

To understand the impact of pooling techniques on the performance of convolutional neural networks (CNNs), this study precisely analyzes three standard datasets, each chosen for its unique characteristics and challenges. The methodologies employed are designed to systematically evaluate and compare the effectiveness of different pooling strategies, thereby offering insights into their implications for CNN performance.

3.1. Datasets

This study used three standard datasets to evaluate the performance of pooling techniques across various convolutional neural network (CNN) architectures: MNIST [], CIFAR-10 [], and CIFAR-100 []. These datasets represent a range of complexities, from simple handwritten digits (MNIST) to diverse object classes (CIFAR-10 and CIFAR-100). The selection of these datasets allows for a comprehensive analysis of how different pooling methods (max, average, and min pooling) impact model performance across varying levels of classification difficulty.

3.2. Model Architecture

Three widely adopted CNN architectures employed in this study are LeNet, AlexNet, and ResNet. These architectures were selected for their established performance and distinct structural characteristics, which provide a robust testing ground for evaluating different pooling strategies. LeNet is a smaller architecture primarily designed for simpler tasks, such as digit classification, and consists of convolutional and fully connected layers. In contrast, AlexNet features a deeper design with more convolutional layers, specifically engineered to address complex image classification problems. ResNet, recognized for its innovative use of residual connections, is deeper and more intricate than both LeNet and AlexNet, making it particularly well-suited for highly dimensional image-classification tasks. Together, these architectures create a comprehensive platform for analyzing the effects of various pooling techniques on model performance across a range of classification challenges.

3.3. Experimental Setup

The experiments in this study were conducted using Keras with TensorFlow as the backend framework, establishing a controlled environment for evaluating the performance of different pooling techniques across various CNN architectures. Each dataset, MNIST, CIFAR-10, and CIFAR-100, was divided using an 80/20 split, allocating 80% of the data for training and reserving 20% for testing. This division allowed for a thorough assessment of both recognition capabilities and generalization of performance of the proposed models. To ensure consistency and fairness in the evaluation, model training was executed under multiple configurations of batch sizes (32, 64, 128, and 256), learning rates (0.01, 0.001, and 0.0001), and dropout rates (ranging from 0.1 to 0.5, including a no-dropout condition). The Stochastic Gradient Descent (SGD) optimizer was utilized for the LeNet architecture, while the Adam and RMSProp optimizers were applied to AlexNet and ResNet, respectively, optimizing model performance for multi-class classification tasks. The categorical cross-entropy loss function guided the training process, with early stopping employed based on validation loss to mitigate overfitting. This comprehensive setup ensured a rigorous evaluation of each pooling method’s impact on model accuracy and generalization across diverse datasets. The proposed architecture of comparative approaches is presented in Table 2, Table 3, Table 4 and Table 5.
Table 2. Proposed architecture of CNN.
Table 3. Proposed Architecture of LeNet.
Table 4. Proposed architecture of AlexNet.
Table 5. Proposed architecture of AlexNet.

3.4. Result and Analysis

In our research, each comparative approach involving various neural network architectures, such as CNN, AlexNet, ResNet, and LeNet, was subjected to different configurations of batch size, learning rate, and dropout. Selecting distinct hyperparameters aimed to explore their impact on model performance and convergence across pooling techniques (max and average) on datasets like MNIST, CIFAR-10, and CIFAR-100. Given the sensitivity of these hyperparameters in influencing training dynamics, their variation resulted in divergent accuracy results across the models. Batch sizes were chosen to regulate the number of samples processed per iteration, affecting gradient updates and the convergence speed. Learning rates played a critical role in controlling the step size during optimization, impacting the model’s ability to navigate the loss landscape. Additionally, dropout rates were manipulated to mitigate overfitting by randomly deactivating neurons during training, affecting the model’s generalization capability. Consequently, the discrepancy in accuracy outcomes underscores the nuanced interplay between these hyperparameters and their consequential effect on model learning and generalization across different network architectures and pooling strategies.

3.5. Performance Evaluation in Terms of Accuracy

This section provides a comprehensive comparative analysis of the accuracy of the MNIST, CIFAR-10, and CIFAR-100 datasets across various convolutional neural network (CNN) architectures, including CNN, LeNet, AlexNet, and ResNet. The performance of these architectures is evaluated under different batch sizes, learning rates, and dropout rates, focusing on the effects of max pooling and average pooling techniques. The objective is to identify which pooling technique and architectural configuration yields the best performance for each dataset, thereby offering valuable insights into optimizing CNN models for diverse image-classification tasks. The detailed results of comparative approaches are presented in Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11.
Table 6. Accuracy of max pooling on standard MNIST dataset with different learning rates and batch sizes with comparative approaches.
Table 7. Accuracy of average pooling on standard MNIST dataset with different learning rates and batch sizes with comparative approaches.
Table 8. Accuracy of max pooling on standard CIFAR 10 dataset with different learning rates and batch sizes with comparative approaches.
Table 9. Accuracy of average pooling on CIFAR 10 dataset with different learning rates and batch sizes with comparative approaches.
Table 10. Accuracy of MAX pooling on standard CIFAR 100 datasets with different learning rates and batch sizes with comparative approaches.
Table 11. Accuracy of average pooling on CIFAR 100 dataset with different learning rates and batch sizes with comparative approaches.
The comparative analysis of CNN, AlexNet, ResNet, and LeNet using max and average pooling methods under varying batch sizes and dropout rates highlights the distinct impact of pooling strategies on model performance from Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11. Under max pooling conditions, AlexNet consistently outperformed other models, showcasing its ability to extract dominant features from the data efficiently. This superior performance is evident in AlexNet’s stable and high classification accuracy across different configurations, even as dropout rates and batch sizes varied. The architecture of AlexNet is particularly suited for capturing the most salient features, which is effectively facilitated by the max pooling method. In contrast, CNN emerged as the second-best performer in the max pooling setup. Although it demonstrated strong feature-extraction capabilities, CNN showed higher sensitivity to changes in hyperparameters compared to AlexNet, indicating some variability in performance.
When average pooling was employed, the performance dynamics shifted, with CNN taking the lead. CNN’s architecture leveraged the generalization provided by average pooling to smooth feature representations, resulting in more stable and consistent accuracy, particularly at moderate dropout rates and optimized batch sizes. This suggests that average pooling is effective for CNN, as it promotes a balanced feature representation across spatial dimensions and reduces overfitting. AlexNet, while still achieving respectable accuracy, performed suboptimally with average pooling. Its reliance on max pooling for optimal feature extraction limited its ability to fully exploit the benefits of average pooling, which is better suited for models that require feature smoothing rather than emphasizing the most activated features.
Both ResNet and LeNet exhibited relatively lower performance with both pooling methods. ResNet’s complex architecture with skip connections did not gain significant benefits from either pooling strategy, while LeNet’s simpler structure was unable to fully capitalize on the feature-extraction capabilities of max or average pooling. Overall, this study underscores the importance of selecting pooling strategies that align with the architectural strengths of deep learning models. The findings demonstrate that AlexNet excels with max pooling, while CNN performs best with average pooling, highlighting the necessity of adaptive pooling approaches to optimize model accuracy based on the specific dataset and model architecture.

3.6. Statistical Significance Analysis

To further validate the robustness and reliability of the comparative results between max and average pooling methods across standard datasets (CIFAR-10, CIFAR-100, and MNIST) using CNN, AlexNet, ResNet, and LeNet, a statistical significance analysis was conducted. Specifically, p-values were calculated to assess whether the differences in performance metrics, such as accuracy, precision, and computational efficiency, were statistically significant. The p-values provide an objective measure to determine whether the observed differences between pooling methods are due to random variations or represent true distinctions in performance. []. A significance threshold of 0.05 was adopted, where p-values below this level indicate that the performance differences are statistically significant and unlikely to have occurred by chance. Table 12 shows the comparative analysis of p-values on standard datasets.
Table 12. Comparative analysis of p-value on a standard dataset.
The p-value results highlight the comparative performance of max pooling and average pooling across various models and datasets, demonstrating that max pooling consistently yields more statistically significant results in many cases. For instance, in the MNIST dataset, max pooling shows superior performance, especially with LeNet, where its p-values are significantly lower across all learning rates, indicating stronger statistical significance compared to average pooling. This trend is also evident in the CIFAR-10 dataset, where max pooling exhibits better robustness, particularly at a learning rate of 0.01, with a p-value of 0.0042 compared to 0.0001 for average pooling. Similarly, in the CIFAR-100 dataset, max pooling performs better at lower learning rates, maintaining strong statistical significance with more consistent p-values across architectures like CNN and LeNet. While average pooling occasionally shows lower p-values, particularly in deeper models like ResNet, max pooling demonstrates greater consistency and reliability across the board. Overall, max pooling emerges as the more robust and reliable pooling method, offering better statistical significance and performance stability across a variety of datasets and architectures. Figure 5 illustrates the comparative analysis of p-values for max pooling and average pooling across different neural network architectures and data.
Figure 5. Comparison of the statistical significance of p-values for max pooling and average pooling in CNN, LeNet, AlexNet, and ResNet across MNIST (b), CIFAR-10 (a), and CIFAR-100 (c) datasets.

3.7. Convergence Graph

A convergence graph visualizes the learning progress of a model over time, illustrating the effectiveness of various training parameters and architectures. This analysis focuses on the convergence graphs of four neural network architectures, CNN, AlexNet, LeNet, and ResNet, generated using Matplotlib, as shown in Figure 6. Key observations from the comparative analysis reveal that CNN and AlexNet exhibit stable convergence with minimal oscillations, indicating reliable training processes. In contrast, AlexNet converges quickly due to its deeper architecture, while ResNet shows initial instability but ultimately achieves strong performance, reflecting its robustness. AlexNet and ResNet demonstrate rapid initial improvements due to their complex designs, effectively capturing intricate patterns, whereas CNN and LeNet show more gradual progress. The plateau in AlexNet and stabilization in ResNet suggest these models reach optimal performance quickly, while CNN and LeNet may require more epochs for full optimization. Overall, the graphs highlight the distinct strengths of each architecture: CNN and AlexNet provide stable learning for simpler tasks, AlexNet excels in fast early stage learning for complex data, and CNN, despite initial fluctuations, demonstrates high performance and robustness. Adjusting hyperparameters like learning rate and batch size can further enhance these architectures for use in specific datasets and tasks.
Figure 6. Accuracy comparison against different parameters on MNIST, CIFAR 10, and CIFAR 100.

3.8. Analysis of Optimal Pooling Performance and Parameter Trends

This section identifies and discusses the optimal performance achieved by max and average pooling across the MNIST, CIFAR-10, and CIFAR-100 datasets while also addressing the observed trends in performance based on parameter variations.
The CNN performs best with average pooling, achieving higher accuracy on MNIST and leveraging this pooling method to generalize effectively across the dataset. Specifically, CNN peaks with average pooling at 98.82% accuracy on MNIST using a learning rate of 0.001 and a batch size of 32 with a low dropout rate, suggesting that CNN benefits from the smoothed feature extraction of average pooling. In contrast, ResNet exhibits superior performance with max pooling, particularly on CIFAR-10 and CIFAR-100, where complex features demand higher selectivity. For instance, ResNet’s optimal CIFAR-10 performance with max pooling reaches 54.68% at a learning rate of 0.0001 and dropout of 0.4–0.5, showcasing its alignment with max pooling’s concentrated feature selection.
On the MNIST dataset, each model displays varied responses to pooling methods. LeNet performs moderately, achieving its peak of 98.9% accuracy with average pooling and a learning rate of 0.001 on batch size of 128, though it generally trails CNN and AlexNet. AlexNet demonstrates robustness with both pooling methods, attaining high performance with max pooling, especially on CIFAR-10, where it reaches 67.89% accuracy with a learning rate of 0.001 and moderate dropout. This suggests that AlexNet’s deeper architecture handles CIFAR-10′s feature complexity well under max pooling. ResNet, which excels at retaining intricate feature details, benefits significantly from max pooling across datasets, particularly CIFAR-100, achieving around 17.13% accuracy. Although this result is lower than with simpler datasets, it reflects ResNet’s high selectivity in feature extraction.
On CIFAR-100, which is more challenging due to its 100-class complexity, CNN’s accuracy peaks at 22.26% with max pooling, again showing a preference for structured feature selection. LeNet, on the other hand, struggles to maintain high accuracy across both pooling methods, achieving only around 13.5% accuracy on CIFAR-100, pointing to limitations in its simpler architecture for such complex data. AlexNet continues to perform relatively well on CIFAR-100, reaching its highest accuracy at around 30.89% with max pooling, while ResNet achieves its best result with max pooling, reaching around 17.13%. The analysis confirms that CNN benefits most from average pooling on MNIST, while ResNet performs best on max pooling across CIFAR datasets. This pattern underlines the importance of aligning pooling techniques with model architecture and dataset complexity for optimal performance.
The observed parameter trends in Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11 reveal the importance of the batch size, learning rate, and dropout rate adjustments, which significantly impact pooling performance across different architectures and datasets. Lower learning rates and moderate batch sizes generally enhanced model stability and accuracy, especially for complex datasets like CIFAR-100. Higher dropout rates tended to improve generalization in average pooling, which smooths feature representations, though they occasionally hindered accuracy in configurations where feature retention was critical, as in max pooling. These trends underscore that max pooling often benefits from moderate batch sizes and minimal dropout to retain strong activations, whereas average pooling performs optimally with larger batch sizes and higher dropout rates, especially for noisier datasets. It is important to note that while these trends offer insight, generalizing them across all applications is challenging, as parameter effects can vary significantly depending on dataset characteristics and task requirements. Adaptive parameter tuning based on dataset-specific needs could provide more reliable results in future studies. Overall, these findings suggest that selecting pooling methods based on dataset complexity and noise levels is crucial, particularly when applying CNNs to resource-constrained environments, where optimized configurations can greatly enhance performance and generalization.

3.9. Discussion

This research provided an in-depth analysis of various pooling methods in Convolutional Neural Networks (CNNs), specifically max pooling, across datasets such as MNIST, CIFAR-10, and CIFAR-100. The results demonstrate that each method has unique strengths and limitations that make it suitable for different tasks. Max pooling consistently performed well in scenarios where preserving high-contrast features and robustness to noise were critical. The ability of max pooling to capture the most prominent features from the feature maps allowed it to excel in classification tasks with high-dimensionality input, such as the CIFAR-100 dataset. However, the technique’s downside lies in its tendency to discard potentially useful information, which may explain its reduced performance when applied to datasets with smaller objects or more intricate details, where preserving all information is important. This is particularly relevant in applications such as small object detection, where discarding finer details may lead to poor localization accuracy, as noted in several studies on object detection. In contrast, average pooling showed more balanced feature representation and performed well in complex datasets, such as CIFAR-10. By smoothing out noise and reducing the impact of any outlier values in the feature map, average pooling can generalize better in tasks where the input is noisy or where capturing the overall context is more important than highlighting specific high-contrast features. For instance, in semantic segmentation tasks, where each pixel’s classification matters more than a focus on high-intensity regions, average pooling may outperform max pooling. However, the downside of this method is its inability to preserve sharp edges and fine details, which are crucial in high-precision tasks like medical image analysis or object detection. Min pooling, though less commonly used, showed its utility in highly specialized tasks such as anomaly detection and background subtraction. By focusing on the least prominent features, min pooling can highlight anomalies or subtle differences in an image, which can be critical in applications like fraud detection or medical imaging, where identifying outliers or rare features is essential. However, the sensitivity of min pooling to noise limits its general applicability in mainstream image-classification tasks, where higher-contrast features dominate.
The results also underscore that no single pooling method universally outperforms others across all tasks, architectures, and datasets. The effectiveness of a pooling technique is highly dependent on the specific task and dataset characteristics. Max pooling might be preferable for tasks involving object detection or datasets with large, prominent features, whereas average pooling would be a better choice for more balanced, noisy datasets requiring more generalization, such as in semantic segmentation. Moreover, recent advancements in CNN architectures, such as hybrid pooling methods and adaptive pooling strategies, have sought to combine the strengths of multiple pooling operations. For example, adaptive pooling methods, which dynamically adjust the pooling strategy based on the input, have shown promise in enhancing performance by balancing feature preservation and computational efficiency. These approaches allow CNNs to adapt better to the varying complexities of different datasets, especially for tasks requiring fine-grained classification like medical imaging and high-precision applications. In practical terms, the choice of pooling method has important implications for resource-constrained environments. Max pooling offers high accuracy but at the cost of potentially discarding important information, while average pooling provides a more generalizable solution but with potential loss of detail. Min pooling is effective for specialized tasks but may not be suitable for broader applications. Future research could benefit from exploring adaptive pooling techniques that optimize the trade-offs between computational cost and feature retention, ensuring that the choice of pooling method aligns with specific task requirements.

4. Conclusions

Convolutional Neural Networks (CNNs) play a crucial role in computer vision, with pooling layers improving efficiency by reducing complexity while preserving key features. This study compared max and pooling across CNN architectures (AlexNet, ResNet, and LeNet) using datasets like MNIST and CIFAR-10. Max pooling performed best in high-dimensionality tasks, and average pooling excelled in handling noisy data. Specifically, the practical recommendations based on the results are to use max pooling for resource-limited environments, and average pooling for tasks involving noisy datasets. While max pooling has demonstrated robust performance across the datasets in this study, average pooling may still play a valuable role in certain contexts, such as noisy data environments or tasks emphasizing spatial continuity. By exploring adaptive or hybrid pooling methods in future research, it may be possible to leverage the strengths of both max and average pooling to enhance model flexibility and performance across a broader range of applications. Future research should explore integrating pooling strategies with Vision Transformers (ViTs) to reduce computational overhead, and adaptive pooling methods could enhance performance in tasks like small object detection and fine-grained classification. Vision Transformers may revolutionize feature extraction by processing global image context without traditional pooling layers.

Author Contributions

Conceptualization, A.Z.; methodology, A.A. (Amerah Alabrah) and N.S.; software, S.Z.; investigation, M.N.; writing—original draft preparation, S.R. and M.S.; writing—review and editing, S.R. and A.A. (Amerah Alabrah); supervision, A.A. (Ali Arshad); funding acquisition, A.A. (Amerah Alabrah). All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Researchers Supporting Project number RSP2024R476, King Saud University, Riyadh, Saudi Arabia.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
  2. Archana, R.; Jeevaraj, P.E. Deep learning models for digital image processing: A review. Artif. Intell. Rev. 2024, 57, 11. [Google Scholar] [CrossRef]
  3. Singh, S.; Gupta, A.; Katiyar, K. Neural modeling and neural computation in a medical approach. In Computational Techniques in Neuroscience; CRC Press: Boca Raton, FL, USA, 2023; pp. 19–41. [Google Scholar]
  4. Taye, M.M. Theoretical understanding of convolutional neural network: Concepts, architectures, applications, future directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
  5. Jiang, P.; Xue, Y.; Neri, F. Convolutional neural network pruning based on multi-objective feature map selection for image classification. Appl. Soft Comput. 2023, 139, 110229. [Google Scholar] [CrossRef]
  6. Valkenborg, D.; Rousseau, A.J.; Geubbelmans, M.; Burzykowski, T. Support vector machines. Am. J. Orthod. Dentofac. Orthop. 2023, 164, 754–757. [Google Scholar] [CrossRef]
  7. Zhang, Z. Introduction to machine learning: K-nearest neighbors. Ann. Transl. Med. 2019, 4, 218. [Google Scholar] [CrossRef]
  8. Zhao, T.; Xie, Y.; Wang, Y.; Cheng, J.; Guo, X.; Hu, B.; Chen, Y. A survey of deep learning on mobile devices: Applications, optimizations, challenges, and research opportunities. Proc. IEEE 2022, 110, 334–354. [Google Scholar] [CrossRef]
  9. De Oliveira, C.I.; do Nascimento, M.Z.; Roberto, G.F.; Tosta, T.A.; Martins, A.S.; Neves, L.A. Hybrid models for classifying histological images: An association of deep features by transfer learning with ensemble classifier. Multimed. Tools Appl. 2024, 83, 21929–21952. [Google Scholar] [CrossRef]
  10. Dogan, Y. A new global pooling method for deep neural networks: Global average of top-k max-pooling. Trait. Du Signal 2023, 40, 577–587. [Google Scholar] [CrossRef]
  11. Chen, Y.; Fang, J.; Zhang, X.; Miao, Y.; Lin, Y.; Tu, R.; Hu, L. Pool fire dynamics: Principles, models and recent advances. Prog. Energy Combust. Sci. 2023, 95, 101070. [Google Scholar] [CrossRef]
  12. Pan, X.; Xu, J.; Pan, Y.; Wen, L.; Lin, W.; Bai, K.; Fu, H.; Xu, Z. Afinet: Attentive feature integration networks for image classification. Neural Netw. 2022, 155, 360–368. [Google Scholar] [CrossRef] [PubMed]
  13. Zhao, L.; Zhang, Z. A improved pooling method for convolutional neural networks. Sci. Rep. 2024, 14, 1589. [Google Scholar] [CrossRef] [PubMed]
  14. Krichen, M. Convolutional neural networks: A survey. Computers 2023, 12, 151. [Google Scholar] [CrossRef]
  15. Matoba, K.; Dimitriadis, N.; Fleuret, F. Benefits of Max Pooling in Neural Networks: Theoretical and Experimental Evidence. In Transactions on Machine Learning Research; 2023. Available online: https://openreview.net/forum?id=YgeXqrH7gA (accessed on 15 September 2024).
  16. Qiu, Y.; Liu, Y.; Chen, Y.; Zhang, J.; Zhu, J.; Xu, J. A2SPPNet: Attentive atrous spatial pyramid pooling network for salient object detection. IEEE Trans. Multimed. 2022, 25, 1991–2006. [Google Scholar] [CrossRef]
  17. Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
  18. Zhou, J.; Liang, Z.; Tan, Z.; Li, W.; Li, Q.; Ying, Z.; Zhai, Y.; He, Y.; Shen, Z. RVDNet: Rotated Vehicle Detection Network with Mixed Spatial Pyramid Pooling for Accurate Localization. In International Conference on Artificial Intelligence and Communication Technology; Springer Nature: Singapore, 2023; pp. 303–316. [Google Scholar]
  19. Özdemir, C. Avg-topk: A new pooling method for convolutional neural networks. Expert Syst. Appl. 2023, 223, 119892. [Google Scholar] [CrossRef]
  20. Tang, T.N.; Kim, K.; Sohn, K. Temporalmaxer: Maximize temporal context with only max pooling for temporal action localization. arXiv 2023, arXiv:2303.09055. [Google Scholar]
  21. Bianchi, F.M.; Lachi, V. The expressive power of pooling in graph neural networks. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar] [CrossRef]
  22. Zhu, X.; Meng, Q.; Ding, B.; Gu, L.; Yang, Y. Weighted pooling for image recognition of deep convolutional neural networks. Clust. Comput. 2019, 22, 9371–9383. [Google Scholar] [CrossRef]
  23. Stergiou, A.; Poppe, R.; Kalliatakis, G. Refining activation downsampling with SoftPool. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10357–10366. [Google Scholar]
  24. Walter, B. Analysis of convolutional neural network image classifiers in a hierarchical max-pooling model with additional local pooling. J. Stat. Plan. Inference 2023, 224, 109–126. [Google Scholar] [CrossRef]
  25. Chen, J.; Hu, H.; Wu, H.; Jiang, Y.; Wang, C. Learning the best pooling strategy for visual semantic embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 15789–15798. [Google Scholar]
  26. Khairandish, M.O.; Sharma, M.; Jain, V.; Chatterjee, J.M.; Jhanjhi, N.Z. A hybrid CNN-SVM threshold segmentation approach for tumor detection and classification of MRI brain images. IRBM 2022, 43, 290–299. [Google Scholar] [CrossRef]
  27. Ding, Y.; Yu, J.; Lv, Q.; Zhao, H.; Dong, J.; Li, Y. Multiview adaptive attention pooling for image–text retrieval. Knowl.-Based Syst. 2024, 291, 111550. [Google Scholar] [CrossRef]
  28. Li, H.; Cheng, Y.; Ni, H.; Zhang, D. Dual-path recommendation algorithm based on CNN and attention-enhanced LSTM. Cyber-Phys. Syst. 2024, 10, 247–262. [Google Scholar] [CrossRef]
  29. Han, Y.; Huang, G.; Song, S.; Yang, L.; Wang, H.; Wang, Y. Dynamic neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7436–7456. [Google Scholar] [CrossRef] [PubMed]
  30. Seng, L.M.; Chiang, B.B.C.; Salam, Z.A.A.; Tan, G.Y.; Chai, H.T. MNIST handwritten digit recognition with different CNN architectures. J. Appl. Technol. Innov 2021, 5, 7–10. [Google Scholar]
  31. Giuste, F.O.; Vizcarra, J.C. Cifar-10 image classification using feature ensembles. arXiv 2020, arXiv:2002.03846. [Google Scholar]
  32. Singla, S.; Singla, S.; Feizi, S. Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100. arXiv 2021, arXiv:2108.04062. [Google Scholar]
  33. Hopkins, W.G.; Rowlands, D.S. Standardization and other approaches to meta-analyze differences in means. Stat. Med. 2024, 43, 3092–3108. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.