Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory
Abstract
:1. Introduction
1.1. Motivation
1.2. Contributions
1.3. Plan of the Article
2. Notation and Preliminaries
2.1. Hypergeometric Function Notation
2.2. Field and Space Notations
2.3. Tensor Product Notation
2.4. Preliminaries
- 1.
- The space H is referred to as the reproducing kernel Hilbert space (RKHS) if , the evaluation functional defined as is continuous.
- 1.
- , that is , and
- 2.
- has the reproducing property; that is
3. Function Space of Generalized Gaussian Radial Basis Measure
Orthonormal Basis
- For , we will show that . For this, consider that
- In order to establish the reproducing property of , pick an arbitrary . Then, consider the inner product of f with , as follows:
4. Restriction and Universality of Reproducing Kernel of RKHS
5. Empirical Evidence and Results Comparison
5.1. Kernel Regression
5.1.1. Example 1
5.1.2. Example 2
5.2. Support Vector Machine
5.3. Neural Network
5.3.1. Activation Function
5.3.2. DCNN
6. Results and Technical Details
- We considered two experiments for the kernel regression of the functions mentioned in Figure 8 and Figure 9. These functions, as easily can be seen, are not well-behaved functions, since they involve transcendental behavior along with function discontinuities (due to or , etc.) followed by uniform randomness due to the of random points. In both of the experiments of this nature, we record the minimum error obtained by employing either the Gaussian Radial Basis Function Kernel or the Generalized Gaussian Radial Basis Function Kernel. From this experiment, we learn that the minimum error recorded in both of them is remarkably small, in the order of ∼, if we use the Generalized Gaussian Radial Basis Function Kernel in contrast to the Gaussian Radial Basis Function Kernel, which provides a minimum error in the order of ∼.
- Next, we consider the experiment of support vector machines for binary classification, in which we generated a random set of points (one can think of them as data) within the unit circle. We identified the random points spread across the circle with the positive class if it lies either in the first or the third quadrant; otherwise, we consider it to be within the negative class. After we generate and identify these datasets, we then identify the support vectors and the decision boundary by employing three (custom) kernel functions over MATLAB R2023b: Gaussian Radial Basis Function Kernel, Sigmoid Kernel, and Generalized Gaussian Radial Basis Function Kernel. We then register the out-of-sample misclassification rate by using 10-fold cross-validation from the aforementioned kernel functions. Here, we immediately learn that the misclassification yielded by Gaussian Radial Basis Function Kernel is comparatively high among all three employed kernel functions; the Generalized Gaussian Radial Basis Function Kernel yields the lowest misclassification rate: upward of .
- For the activation function application in the domain of neural networks, we consider a practical example of pattern recognition for the synthetic hand-written English letter dataset in the form of a array (lettersTrainSet in MATLAB). As already mentioned in the respective subsection, two layered neural nets were constructed whose layer names are given as follows:
- (a)
- Image input;
- (b)
- 2D convolution;
- (c)
- Batch normalization;
- (d)
- (29) vs. Generalized Gaussian Radial Basis Function;
- (e)
- Ten fully connected layers;
- (f)
- Softmax;
- (g)
- Classification layer.
It should be noted that, from the above layered neural network constructed, both of the neural network training experiments came to a complete stop when the maximum epoch of 30 amounting to 330 iterations were finished. Additionally, the sequence of inserting the neural net layer was the same as that which was given above from (a)–(g). Once the desired neural net is set with both of the activation functions, we immediately learn that the neural net containing the activation function of the Generalized Gaussian Radial Basis Function Kernel as one of its layers simply outperforms its opponent, i.e., the neural net layer containing the activation function of (29). - With the establishment of the basics of the neural network architecture in the previous listing point, we train and execute a deep convolution neural net for the representative images of the alphabet characters A, B, and C. A list of the seven layers involved in the construction of a deep convolution neural network is presented as follows:
- (a)
- Image input;
- (b)
- 2D convolution;
- (c)
- (29) vs. Generalized Gaussian Radial Basis Function;
- (d)
- A 2 max pooling with stride [2 2] and padding [0 0 0 0];
- (e)
- Three fully connected layers;
- (f)
- Softmax;
- (g)
- Classification layer.
Again, the sequence of inserting the neural net layers was the same as what is given above from (a)–(g) to construct the present DCNN. Similarly, the result for the successful classification performances by the DCNN which contains the activation layer of Generalized Gaussian Radial Basis Function Kernel achieves an accuracy upward of . However, we fail to see the match-able competitiveness by the DCNN, which contains the (29) as the activation layer.
7. Future Directions
Eigen Function Expansion of GGRBF
- 1.
- the eigenvalues are absolutely summable
- 2.
- holds almost everywhere, where the series converges absolutely and uniformly almost everywhere.
8. Discussion
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Baddoo, P.J.; Herrmann, B.; McKeon, B.J.; Brunton, S.L. Kernel learning for robust Dynamic Mode Decomposition: Linear and Nonlinear disambiguation optimization. Proc. R. Soc. A 2022, 478, 20210830. [Google Scholar] [CrossRef] [PubMed]
- Gianola, D.; Van Kaam, J.B. Reproducing Kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 2008, 178, 2289–2303. [Google Scholar] [CrossRef] [PubMed]
- Attia, N.; Akgül, A.; Seba, D.; Nour, A. Reproducing kernel Hilbert space method for the numerical solutions of fractional cancer tumor models. Math. Methods Appl. Sci. 2023, 46, 7632–7653. [Google Scholar] [CrossRef]
- Mathematical Functions Power Artificial Intelligence. Available online: https://nap.nationalacademies.org/resource/other/deps/illustrating-math/interactive/mathematical-functions-power-ai.html (accessed on 15 February 2024).
- Mathematics and Statistics of Weather Forecasting. Available online: https://nap.nationalacademies.org/resource/other/deps/illustrating-math/interactive/mathematics-and-statistics-of-weather-forecasting.html (accessed on 15 February 2024).
- Kalman, B.L.; Kwasny, S.C. Why tanh: Choosing a sigmoidal function. In Proceedings of the 1992 IJCNN International Joint Conference on Neural Networks, Baltimore, MD, USA, 7–11 June 1992; Volume 4, pp. 578–581. [Google Scholar]
- Lu, L.; Shin, Y.; Su, Y.; Karniadakis, G.E. Dying ReLU and initialization: Theory and numerical examples. arXiv 2019, arXiv:1903.06733. [Google Scholar] [CrossRef]
- Fasshauer, G.E. Meshfree Approximation Methods with MATLAB; World Scientific: Singapore, 2007; Volume 6. [Google Scholar]
- Brunton, S.L.; Kutz, J.N. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
- Stein, M.L. Interpolation of Spatial Data: Some Theory for Kriging; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
- Rosenfeld, J.A.; Russo, B.; Kamalapurkar, R.; Johnson, T.T. The occupation kernel method for nonlinear system identification. arXiv 2019, arXiv:1909.11792. [Google Scholar]
- Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
- Park, J.; Sandberg, I.W. Universal approximation using radial-basis-function networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef] [PubMed]
- Singh, H. A new kernel function for better AI methods. In Proceedings of the 2023 Spring Eastern Sectional Meeting, Virtual, 1–2 April 2023. [Google Scholar]
- Karimi, N.; Kazem, S.; Ahmadian, D.; Adibi, H.; Ballestra, L. On a generalized Gaussian radial basis function: Analysis and applications. Eng. Anal. Bound. Elem. 2020, 112, 46–57. [Google Scholar] [CrossRef]
- Aronszajn, N. Theory of reproducing kernels. Trans. Am. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
- Steinwart, I.; Hush, D.; Scovel, C. An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Trans. Inf. Theory 2006, 52, 4635–4643. [Google Scholar] [CrossRef]
- Barnes, E.W.V. The asymptotic expansion of integral functions defined by Taylor’s series. Philos. Trans. R. Soc. London. Ser. A Contain. Pap. A Math. Phys. Character 1906, 206, 249–297. [Google Scholar]
- Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; US Government Printing Office: Washington, DC, USA, 1968; Volume 55. [Google Scholar]
- Gradshteyn, I.S.; Ryzhik, I.M. Table of Integrals, Series, and Products, 7th ed.; Academic Press: Cambridgem MA, USA; Elsevier: Amsterdam, The Netherlands, 2007. [Google Scholar]
- Christmann, A.; Steinwart, I. Support Vector Machines; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Steinwart, I. Consistency of support vector machines and other regularized kernel classifiers. IEEE Trans. Inf. Theory 2005, 51, 128–142. [Google Scholar] [CrossRef]
- Devroye, L.; Györfi, L.; Lugosi, G. A Probabilistic Theory of Pattern Recognition; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 31. [Google Scholar]
- Kidger, P.; Lyons, T. Universal approximation with deep narrow networks. In Proceedings of the Conference on Learning Theory, Graz, Austria, 9–12 July 2020; Proceedings of Machine Learning Research. pp. 2306–2327. [Google Scholar]
- Cybenko, G. Approximation by Superpositions of a Sigmoidal Function. Math. Control. Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
- Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
- Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numer. 1999, 8, 143–195. [Google Scholar]
- Hermite, M. Sur un Nouveau Développement en Série des Fonctions; Imprimerie de Gauthier-Villars: 1864; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Fasshauer, G.E.; McCourt, M.J. Stable evaluation of Gaussian radial basis function interpolants. Siam J. Sci. Comput. 2012, 34, A737–A762. [Google Scholar] [CrossRef]
- Rasmussen, C.E.; Williams, C.K. Gaussian Processes for Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 1. [Google Scholar]
- Zhu, H.; Williams, C.K.; Rohwer, R.; Morciniec, M. Gaussian Regression and Optimal Finite Dimensional Linear Models; Neural Networks and Machine Learning; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
- Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (Gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
Training with GGRBF (on Single CPU) | |||||
---|---|---|---|---|---|
Epoch | Iteration | Runtime (hh:mm:ss) | Accuracy (%) | Batch Loss | Learning Rate |
1 | 1 | 00:00:04 | 8.59 | 2.5329 | 0.0010 |
2 | 50 | 00:00:10 | 75.78 | 1.0087 | 0.0010 |
3 | 100 | 00:00:15 | 93.75 | 0.4377 | 0.0010 |
4 | 150 | 00:00:21 | 99.22 | 0.2452 | 0.0010 |
6 | 200 | 00:00:26 | 99.22 | 0.1790 | 0.0010 |
7 | 250 | 00:00:31 | 100.00 | 0.1028 | 0.0010 |
8 | 300 | 00:00:36 | 100.00 | 0.0681 | 0.0010 |
9 | 350 | 00:00:41 | 100.00 | 0.0458 | 0.0010 |
10 | 390 | 00:00:45 | 100.00 | 0.0382 | 0.0010 |
Training with (29) (on Single CPU) | |||||
---|---|---|---|---|---|
Epoch | Iteration | Runtime (hh:mm:ss) | Accuracy (%) | Batch Loss | Learning Rate |
1 | 1 | 00:00:08 | 8.59 | 2.9976 | 0.0010 |
2 | 50 | 00:00:15 | 78.91 | 0.6053 | 0.0010 |
3 | 100 | 00:00:19 | 86.72 | 0.4108 | 0.0010 |
4 | 150 | 00:00:25 | 90.62 | 0.3122 | 0.0010 |
6 | 200 | 00:00:30 | 96.88 | 0.1632 | 0.0010 |
7 | 250 | 00:00:36 | 99.22 | 0.0695 | 0.0010 |
8 | 300 | 00:00:41 | 99.22 | 0.0778 | 0.0010 |
9 | 350 | 00:00:46 | 100.00 | 0.0390 | 0.0010 |
10 | 390 | 00:00:49 | 100.00 | 0.0355 | 0.0010 |
AI Learning Architecture | Activation Function | Reference | Min. Error | Misclass. % | Accuracy % | Runtime (s) |
---|---|---|---|---|---|---|
Kernel Regression | GRBF | Figure 8 | 0.00230 | - | - | - |
Kernel Regression | GGRBF | Figure 8 | - | - | - | |
Kernel Regression | GRBF | Figure 9 | 0.00100 | - | - | - |
Kernel Regression | GGRBF | Figure 9 | - | - | - | |
SVM | GRBF | Figure 10 | - | 5.75 | 94.25 | - |
SVM | Sigmoid | Figure 10 | - | 4.50 | 95.50 | - |
SVM | GGRBF | Figure 10 | - | - | ||
AF in NN | GGRBF | Table 1 | - | 0.43077 | ||
AF in NN | ReLU | Table 2 | - | 6.52 | 93.48 | 1.8935 |
DCNN | GGRBF | Figure 13 | - | - | ||
DCNN | ReLU | Figure 14 | - | 8.13 | 91.87 | - |
Training with GGRBF (on a Single CPU) | |||||
---|---|---|---|---|---|
Epoch | Iteration | Runtime (hh:mm:ss) | Accuracy (%) | Batch Loss | Learning Rate |
1 | 1 | 00:00:00 | 11.72 | 2.5329 | 0.0010 |
2 | 50 | 00:00:04 | 72.66 | 1.0087 | 0.0010 |
3 | 100 | 00:00:09 | 84.38 | 0.4377 | 0.0010 |
4 | 150 | 00:00:13 | 89.06 | 0.2452 | 0.0010 |
6 | 200 | 00:00:18 | 98.44 | 0.1790 | 0.0010 |
7 | 250 | 00:00:23 | 100.00 | 0.1028 | 0.0010 |
8 | 300 | 00:00:27 | 100.00 | 0.0681 | 0.0010 |
9 | 350 | 00:00:32 | 100.00 | 0.0458 | 0.0010 |
11 | 400 | 00:00:36 | 100.00 | 0.0382 | 0.0010 |
12 | 450 | 00:00:41 | 100.00 | 0.0382 | 0.0010 |
13 | 500 | 00:00:45 | 100.00 | 0.0382 | 0.0010 |
14 | 546 | 00:00:50 | 100.00 | 0.0382 | 0.0010 |
Training with (37) (on a Single CPU) | |||||
---|---|---|---|---|---|
Epoch | Iteration | Runtime (hh:mm:ss) | Accuracy (%) | Batch Loss | Learning Rate |
1 | 1 | 00:00:00 | 10.94 | 2.7556 | 0.0010 |
2 | 50 | 00:00:04 | 82.81 | 0.5065 | 0.0010 |
3 | 100 | 00:00:07 | 97.66 | 0.1923 | 0.0010 |
4 | 150 | 00:00:11 | 100.00 | 0.0738 | 0.0010 |
6 | 200 | 00:00:14 | 100.00 | 0.0446 | 0.0010 |
7 | 250 | 00:00:28 | 100.00 | 0.0262 | 0.0010 |
8 | 300 | 00:00:21 | 100.00 | 0.0211 | 0.0010 |
9 | 350 | 00:00:25 | 100.00 | 0.0173 | 0.0010 |
11 | 400 | 00:00:29 | 100.00 | 0.0099 | 0.0010 |
12 | 450 | 00:00:32 | 100.00 | 0.0085 | 0.0010 |
13 | 500 | 00:00:36 | 100.00 | 0.0085 | 0.0010 |
14 | 546 | 00:00:39 | 100.00 | 0.0068 | 0.0010 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Singh, H. Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory. Mathematics 2024, 12, 829. https://doi.org/10.3390/math12060829
Singh H. Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory. Mathematics. 2024; 12(6):829. https://doi.org/10.3390/math12060829
Chicago/Turabian StyleSingh, Himanshu. 2024. "Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory" Mathematics 12, no. 6: 829. https://doi.org/10.3390/math12060829
APA StyleSingh, H. (2024). Machine Learning Application of Generalized Gaussian Radial Basis Function and Its Reproducing Kernel Theory. Mathematics, 12(6), 829. https://doi.org/10.3390/math12060829