Composition of Activation Functions and the Reduction to Finite Domain
Abstract
1. Introduction
- Enhanced Capacity for Complex Modeling:
- Diversification of Non-linearity: Different activation functions have different characteristics. For example, ReLU introduces sparsity, while Sigmoid squashes values into a range. By composing them, the network potentially can learn a wider variety of non-linear transformations and capture more intricate patterns in the data.
- Improved Training Dynamics:
- Mitigating Gradient Problems: Activation functions influence gradient flow during training. Using different activation functions can potentially help address issues like vanishing or exploding gradients, which hinder learning in deep networks.
- Faster Convergence: Certain activation functions, like ReLU, can accelerate the convergence of the training process compared to others like Sigmoid or Tanh. Combining different functions can potentially lead to faster training and competitive performance.
- Enhanced Generalization and Robustness:
- Better Generalization: By learning richer representations of the data through diverse activation functions, the network’s ability to generalize well to unseen data improves, reducing the risk of overfitting.
- Increased Robustness: Networks with carefully chosen activation functions can handle variations in input data more effectively, adapting to noise, missing data, or unexpected perturbations.
- Adaptation to Input Characteristics:
- Handling Diverse Data: Different activation functions can be suited to different data characteristics. For instance, tanh can be useful when dealing with data containing both positive and negative values.
- Potential for Architectural Interpretability:
- Insight into Learning: By using distinct activation functions, different parts of the network might become responsible for capturing specific features, which can potentially offer insights into how the model learns.
- Learning more complex patterns.
- Faster and more stable training.
- Better generalization to new data.
- Greater adaptability to diverse data.
2. Basics
3. Background
4. Main Results
Funding
Data Availability Statement
Conflicts of Interest
References
- Cardaliaguet, P.; Euvrard, G. Approximation of a function and its derivative with a neural network. Neural Netw. 1992, 5, 207–220. [Google Scholar] [CrossRef]
- Anastassiou, G.A. Rate of Convergence of some neural network operators to the unit—Univariate case. J. Math. Appl. 1997, 22, 237–262. [Google Scholar] [CrossRef]
- Anastassiou, G.A. Intelligent Systems II: Complete Approximation by Neural Network Operators; Springer: Heidelberg, Germany; New York, NY, USA, 2016. [Google Scholar]
- Chen, Z.; Cao, F. The approximation operators with sigmoidal functions. Comput. Math. Appl. 2009, 58, 758–765. [Google Scholar] [CrossRef]
- Costarelli, D.; Spigler, R. Approximation results for neural network operators activated by sigmoidal functions. Neural Netw. 2013, 44, 101–106. [Google Scholar] [CrossRef] [PubMed]
- Costarelli, D.; Spigler, R. Multivariate neural network operators with sigmoidal activation functions. Neural Netw. 2013, 48, 72–77. [Google Scholar] [CrossRef] [PubMed]
- Haykin, S. Neural Networks: A Comprehensive Foundation, 2nd ed.; Prentice Hall: New York, NY, USA, 1998. [Google Scholar]
- McCulloch, W.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 7, 115–133. [Google Scholar] [CrossRef]
- Mitchell, T.M. Machine Learning; WCB-McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
- Yu, D.S.; Cao, F.L. Construction and approximation rate for feed-forward neural network operators with sigmoidal functions. J. Comput. Appl. Math. 2025, 453, 116150. [Google Scholar] [CrossRef]
- Cen, S.; Jin, B.; Quan, Q.; Zhou, Z. Hybrid neural-network FEM approximation of diffusion coeficient in elyptic and parabolic problems. IMA J. Numer. Anal. 2024, 44, 3059–3093. [Google Scholar] [CrossRef]
- Coroianu, L.; Costarelli, D.; Natale, M.; Pantiş, A. The approximation capabilities of Durrmeyer-type neural network operators. J. Appl. Math. Comput. 2024, 70, 4581–4599. [Google Scholar] [CrossRef]
- Warin, X. The GroupMax neural network approximation of convex functions. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 11608–11612. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Anastassiou, G.A. Composition of Activation Functions and the Reduction to Finite Domain. Mathematics 2025, 13, 3177. https://doi.org/10.3390/math13193177
Anastassiou GA. Composition of Activation Functions and the Reduction to Finite Domain. Mathematics. 2025; 13(19):3177. https://doi.org/10.3390/math13193177
Chicago/Turabian StyleAnastassiou, George A. 2025. "Composition of Activation Functions and the Reduction to Finite Domain" Mathematics 13, no. 19: 3177. https://doi.org/10.3390/math13193177
APA StyleAnastassiou, G. A. (2025). Composition of Activation Functions and the Reduction to Finite Domain. Mathematics, 13(19), 3177. https://doi.org/10.3390/math13193177
