Shallow Fully Connected Neural Network Training by Forcing Linearization into Valid Region and Balancing Training Rates
Abstract
:1. Introduction
2. Proposed Supervisory Training Rule for SFCNN
2.1. Derivation of the Proposed Training Rule
2.2. Determining Hyperparameters
3. Results and Discussions
3.1. A pH Process
3.2. Representative Datasets for Neural Network Training: Boston Housing Price and Automobile Mileage Per Gallon
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bhat, N.V.; Minderman, P.A.; McAvoy, T.; Wang, N.S. Modeling chemical process systems via neural computation. IEEE Control Syst. Mag. 1990, 10, 24–30. [Google Scholar] [CrossRef]
- Bhat, N.; McAvoy, T.J. Use of neural nets for dynamic modeling and control of chemical process systems. Comput. Chem. Eng. 1990, 14, 573–582. [Google Scholar] [CrossRef]
- Chen, S.; Billings, S.A.; Grant, P.M. Non-linear system identification using neural networks. Int. J. Control 1990, 51, 1191–1214. [Google Scholar] [CrossRef]
- Fukuda, T.; Shibata, T. Theory and applications of neural networks for industrial control systems. IEEE Trans. Ind. Electron. 1992, 39, 472–489. [Google Scholar] [CrossRef]
- Ydstie, B.E. Forecasting and control using adaptive connectionist networks. Comput. Chem. Eng. 1990, 14, 583–599. [Google Scholar] [CrossRef]
- Savković-Stevanović, J. Neural networks for process analysis and optimization: Modeling and applications. Comput. Chem. Eng. 1994, 18, 1149–1155. [Google Scholar] [CrossRef]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Henrique, H.M.; Lima, E.L.; Seborg, D.E. Model structure determination in neural network models. Chem. Eng. Sci. 2000, 55, 5457–5469. [Google Scholar] [CrossRef]
- Boozarjomehry, R.B.; Svrcek, W.Y. Automatic design of neural network structures. Comput. Chem. Eng. 2001, 25, 1075–1088. [Google Scholar] [CrossRef]
- Derks, E.; Buydens, L.M.C. Aspects of network training and validation on noisy data: Part 1. Training aspects. Chemom. Intell. Lab. Syst. 1998, 41, 171–184. [Google Scholar] [CrossRef]
- Pan, Y.; Sung, S.W.; Lee, J.H. Data-based construction of feedback-corrected nonlinear prediction model using feedback neural networks. Control Eng. Pract. 2001, 9, 859–867. [Google Scholar] [CrossRef]
- Lee, D.S.; Jeon, C.O.; Park, J.M.; Chang, K.S. Hybrid neural network modeling of a full-scale industrial wastewater treatment process. Biotechnol. Bioeng. 2002, 78, 670–682. [Google Scholar] [CrossRef] [PubMed]
- Dogan, E.; Sengorur, B.; Koklu, R. Modeling biological oxygen demand of the Melen River in Turkey using an artificial neural network technique. J. Environ. Econ. Manag. 2009, 90, 1229–1235. [Google Scholar] [CrossRef] [PubMed]
- Heo, S.; Lee, J.H. Parallel neural networks for improved nonlinear principal component analysis. Comput. Chem. Eng. 2019, 127, 1–10. [Google Scholar] [CrossRef]
- Jawad, J.; Hawari, A.H.; Zaidi, S.J. Artificial neural network modeling of wastewater treatment and desalination using membrane processes: A review. Chem. Eng. J. 2021, 419, 129540. [Google Scholar] [CrossRef]
- Li, Y.; Jia, M.; Han, X.; Bai, X.-S. Towards a comprehensive optimization of engine efficiency and emissions by coupling artificial neural network (ANN) with genetic algorithm (GA). Energy 2021, 225, 120331. [Google Scholar] [CrossRef]
- Bakay, M.S.; Ağbulut, Ü. Electricity production based forecasting of greenhouse gas emissions in Turkey with deep learning, support vector machine and artificial neural network algorithms. J. Clean. Prod. 2021, 285, 125324. [Google Scholar] [CrossRef]
- Cui, Z.; Wang, L.; Li, Q.; Wang, K. A comprehensive review on the state of charge estimation for lithium-ion battery based on neural network. Int. J. Energy Res. 2022, 46, 5423–5440. [Google Scholar] [CrossRef]
- Dauphin, Y.N.; Pascanu, R.; Gulcehre, C.; Cho, K.; Ganguli, S.; Bengio, Y. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
- Nesterov, Y. A method for unconstrained convex minimization problem with the rate of convergence O (1/k2). Dokl. Acad. Sci. USSR 1983, 269, 543–547. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- d’Ascoli, S.; Refinetti, M.; Biroli, G. Optimal learning rate schedules in high-dimensional non-convex optimization problems. arXiv 2022, arXiv:2202.04509. [Google Scholar]
- Van Der Smagt, P.P. Minimisation methods for training feedforward neural networks. Neural Netw. 1994, 7, 1–11. [Google Scholar] [CrossRef]
- Begum, K.G.; Rao, A.S.; Radhakrishnan, T.K. Enhanced IMC based PID controller design for non-minimum phase (NMP) integrating processes with time delays. ISA Trans. 2017, 68, 223–234. [Google Scholar] [CrossRef]
- Sagun, L.; Evci, U.; Guney, V.U.; Dauphin, Y.; Bottou, L. Empirical analysis of the hessian of over-parametrized neural networks. arXiv 2017, arXiv:1706.04454. [Google Scholar]
- Yao, Z.; Gholami, A.; Lei, Q.; Keutzer, K.; Mahoney, M.W. Hessian-based analysis of large batch training and robustness to adversaries. Adv. Neural Inf. Process. Syst. 2018, 31, 1–11. [Google Scholar]
- Oymak, S.; Soltanolkotabi, M. Toward moderate overparameterization: Global convergence guarantees for training shallow neural networks. IEEE J. Sel. Areas Inf. Theory 2020, 1, 84–105. [Google Scholar] [CrossRef]
- Keskar, N.S.; Mudigere, D.; Nocedal, J.; Smelyanskiy, M.; Tang, P.T.P. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv 2016, arXiv:1609.04836. [Google Scholar]
- Li, C.; Farkhoor, H.; Liu, R.; Yosinski, J. Measuring the intrinsic dimension of objective landscapes. arXiv 2018, arXiv:1804.08838. [Google Scholar]
- Li, H.; Xu, Z.; Taylor, G.; Studer, C.; Goldstein, T. Visualizing the loss landscape of neural nets. Adv. Neural Inf. Process. Syst. 2018, 31, 1–11. [Google Scholar]
- Draxler, F.; Veschgini, K.; Salmhofer, M.; Hamprecht, F. Essentially no barriers in neural network energy landscape. In Proceedings of the International Conference on Machine Learning PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1309–1318. [Google Scholar]
- Ghorbani, B.; Krishnan, S.; Xiao, Y. An investigation into neural net optimization via hessian eigenvalue density. In Proceedings of the International Conference on Machine Learning PMLR, Long Beach, CA, USA, 10–15 June 2019; pp. 2232–2241. [Google Scholar]
- Granziol, D.; Garipov, T.; Vetrov, D.; Zohren, S.; Roberts, S.; Wilson, A.G. Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Gilmer, J.; Ghorbani, B.; Garg, A.; Kudugunta, S.; Neyshabur, B.; Cardoze, D.; Dahl, G.E.; Nado, Z.; Firat, O. A Loss Curvature Perspective on Training Instabilities of Deep Learning Models. In Proceedings of the International Conference on Learning Representations, Virtual, 25 April 2022. [Google Scholar]
- Sung, S.W.; Lee, T.; Park, S. Improved training rules for multilayered feedforward neural networks. Ind. Eng. Chem. Res. 2003, 42, 1275–1278. [Google Scholar] [CrossRef]
- Sung, S.W.; Lee, J.; Lee, I.-B. Process Identification and PID Control; John Wiley & Sons: Hoboken, NJ, USA, 2009; ISBN 978-0-470-82410-8. [Google Scholar]
- Yoo, C.K.; Sung, S.W.; Lee, I.-B. Generalized damped least squares algorithm. Comput. Chem. Eng. 2003, 27, 423–431. [Google Scholar] [CrossRef]
- Sung, S.W.; Lee, I.-B.; Yang, D.R. pH Control using an identification reactor. Ind. Eng. Chem. Res. 1995, 34, 2418–2426. [Google Scholar] [CrossRef]
- Harrison, D., Jr.; Rubinfeld, D.L. Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef] [Green Version]
- Quinlan, J.R. Combining instance-based and model-based learning. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA, 27–29 July 1993; pp. 236–243. [Google Scholar]
Method | Hidden Layer with 3 Nodes | Hidden Layer with 7 Nodes | ||
---|---|---|---|---|
Mean | Standard Deviation | Mean | Standard Deviation | |
GDM | 7.319 × 10−3 | 4.417 × 10−4 | 4.046 × 10−3 | 1.529 × 10−4 |
SGD | 1.697 × 10−3 | 7.664 × 10−5 | 1.681 × 10−3 | 7.235 × 10−5 |
ADAM | 1.668 × 10−3 | 7.142 × 10−5 | 1.668 × 10−3 | 7.142 × 10−5 |
LM | 1.201 × 10−3 | 3.603 × 10−4 | 3.068 × 10−4 | 6.079 × 10−5 |
Proposed | 1.140 × 10−3 | 3.284 × 10−4 | 2.825 × 10−4 | 4.940 × 10−5 |
Method | Number of Iterations to Converge | Computation Time to Converge (s) | ||
---|---|---|---|---|
Mean | Standard Deviation | Mean | Standard Deviation | |
GDM | 94,494 | 843.1 | 130.4 | 16.4 |
SGD | 6318.3 | 2815.3 | 29.0 | 12.5 |
ADAM | 637.57 | 180.25 | 3.82 | 1.23 |
LM | 145.55 | 63.79 | 8.68 | 3.74 |
Proposed | 107.40 | 44.99 | 7.94 | 3.30 |
Method | Boston Housing Prices | Automobile MPG | ||
---|---|---|---|---|
Mean | Standard Deviation | Mean | Standard Deviation | |
GDM | Not converge | Not converge | Not converge | Not converge |
SGD | 1.127 × 10−3 | 1.066 × 10−3 | 1.405 × 10−3 | 1.215 × 10−3 |
ADAM | 5.858 × 10−4 | 4.368 × 10−4 | 6.417 × 10−4 | 7.005 × 10−5 |
LM | 1.001 × 10−4 | 2.653 × 10−5 | 2.477 × 10−4 | 4.010 × 10−5 |
Proposed | 7.393 × 10−5 | 1.649 × 10−5 | 2.407 × 10−4 | 2.990 × 10−5 |
Method | Boston Housing Prices | Automobile MPG | ||
---|---|---|---|---|
Mean | Standard Deviation | Mean | Standard Deviation | |
GDM | Not converge | Not converge | Not converge | Not converge |
SGD | 4330 | 5123 | 2833 | 4437 |
ADAM | 4097 | 2579 | 3156 | 5506 |
LM | 29.9 | 2.4 | 30.7 | 3.3 |
Proposed | 26.6 | 2.0 | 29.1 | 2.3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Heo, J.P.; Im, C.G.; Ryu, K.H.; Sung, S.W.; Yoo, C.; Yang, D.R. Shallow Fully Connected Neural Network Training by Forcing Linearization into Valid Region and Balancing Training Rates. Processes 2022, 10, 1157. https://doi.org/10.3390/pr10061157
Heo JP, Im CG, Ryu KH, Sung SW, Yoo C, Yang DR. Shallow Fully Connected Neural Network Training by Forcing Linearization into Valid Region and Balancing Training Rates. Processes. 2022; 10(6):1157. https://doi.org/10.3390/pr10061157
Chicago/Turabian StyleHeo, Jea Pil, Chang Gyu Im, Kyung Hwan Ryu, Su Whan Sung, Changkyoo Yoo, and Dae Ryook Yang. 2022. "Shallow Fully Connected Neural Network Training by Forcing Linearization into Valid Region and Balancing Training Rates" Processes 10, no. 6: 1157. https://doi.org/10.3390/pr10061157
APA StyleHeo, J. P., Im, C. G., Ryu, K. H., Sung, S. W., Yoo, C., & Yang, D. R. (2022). Shallow Fully Connected Neural Network Training by Forcing Linearization into Valid Region and Balancing Training Rates. Processes, 10(6), 1157. https://doi.org/10.3390/pr10061157