Evaluating the Learning Procedure of CNNs through a Sequence of Prognostic Tests Utilising Information Theoretical Measures
Abstract
:1. Introduction
2. Related Work
2.1. Understanding Deep Neural Networks
2.2. Explainable Machine Learning
3. Background
3.1. Information Entropy
3.2. Joint Entropy
3.3. Conditional Entropy
3.4. Mutual Information
3.5. Data Processing Inequalities
4. Methodology
4.1. Experimentation Methodology
- A convolutional neural network model is designed and compiled.
- While the compiled model is being trained, the network weights are saved after every epoch.
- The models selected for visualisation are rerun and the outputs of hidden layers (activations) are extracted and assigned to the predefined lists.
- Finally, from these outputs, information quantities are calculated and plotted for evaluation.
4.2. Datasets
4.3. Calculation of Information Quantities
4.4. Information Quantities of the MNIST Images
5. Experiments and Results
- How does the number of layers in the network affect the learning progress?
- What are the effects of kernel size on CNN network’s learning behaviour?
- What are the effects of dropout layer on the learning of CNN networks?
5.1. Experiment 1—Effect of Number of Convolutional Layers on Learning
- Mutual information between the model’s input and output, , as shown in Figure 6a,c;
- Mutual information between the model’s output and true labels, , as shown in Figure 6b,d;
- Mutual information between the input and output of interim layers through the network with six convolutional layers for the MNIST and Fashion-MNIST datasets, as shown in Figure 7.
5.2. Experiment 2—Effect of Kernel Size of Convolutional Layer on Learning
5.3. Experiment 3—Effect of Dropout Layer on Learning
5.4. Main Findings
5.4.1. Effects of Number of Convolutional Layers on Learning
5.4.2. Effect of Kernel Size of Convolutional Layers on Leaning
5.4.3. Effect of Dropout Layer on Learning
5.5. Future Work
- (1)
- By applying different numbers of inputs to the training model, the effect of the number of training data on learning could be observed with the same approach.
- (2)
- By using different optimisation functions for the model during the learning process, the effect of the optimisation functions on learning could be observed.
- (3)
- The reliability of the proposed method could be determined by trying different estimation methods for mutual information calculation.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Graves, A.; Mohamed, A.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; LeCun, Y. Text Understanding from Scratch. arXiv 2015, arXiv:1502.01710. [Google Scholar]
- Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.-R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Singh, A.; Sengupta, S.; Lakshminarayanan, V. Explainable Deep Learning Models in Medical Image Analysis. J. Imaging 2020, 6, 52. [Google Scholar] [CrossRef] [PubMed]
- Alain, G.; Bengio, Y. Understanding Intermediate Layers Using Linear Classifier Probes. 2018. Available online: https://arxiv.org/abs/1610.01644v4 (accessed on 1 September 2020).
- Cheng, H.; Lian, D.; Gao, S.; Geng, Y. Utilizing Information Bottleneck to Evaluate the Capability of Deep Neural Networks for Image Classification. Entropy 2019, 21, 456. [Google Scholar] [CrossRef] [Green Version]
- Yu, S.; Wickstrom, K.; Jenssen, R.; Príncipe, J.C. Understanding Convolutional Neural Networks with Information Theory: An Initial Exploration. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 435–442. [Google Scholar] [CrossRef] [PubMed]
- Shwartz-Ziv, R.; Tishby, N. Opening the Black Box of Deep Neural Networks via Information. 2017. Available online: https://arxiv.org/pdf/1703.00810v3.pdf (accessed on 28 December 2021).
- Tishby, N.; Zaslavsky, N. Deep learning and the information bottleneck principle. In Proceedings of the 2015 IEEE Information Theory Workshop (ITW), Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar] [CrossRef] [Green Version]
- Balda, E.R.; Behboodi, A.; Mathar, R. An Information Theoretic View on Learning of Artificial Neural Networks. In Proceedings of the 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS), Cairns, Australia, 17–19 December 2018; pp. 1–8. [Google Scholar] [CrossRef]
- Castelvecchi, D. Can we open the black box of AI? Nature 2016, 538, 20–23. [Google Scholar] [CrossRef] [Green Version]
- Roscher, R.; Bohn, B.; Duarte, M.F.; Garcke, J. Explainable Machine Learning for Scientific Insights and Discoveries. IEEE Access 2020, 8, 42200–42216. [Google Scholar] [CrossRef]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef] [Green Version]
- Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. In Proceedings of the European Conference on Computer Vision 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; Volume 8689, pp. 818–833. [Google Scholar] [CrossRef]
- Bae, I.; Moon, J.; Kim, S. Driving Preference Metric-Aware Control for Self-Driving Vehicles. IJIES 2019, 12, 157–166. [Google Scholar] [CrossRef]
- Tishby, N.; Pereira, F.C.; Bialek, W. The Information Bottleneck Method. arXiv 2000, arXiv:physics/0004057. [Google Scholar]
- Gabrié, M.; Manoel, A.; Luneau, C.; Barbier, J.; Macris, N.; Krzakala, F.; Zdeborová, L. Entropy and mutual information in models of deep neural networks. J. Stat. Mech. 2019, 2019, 124014. [Google Scholar] [CrossRef]
- Saxe, A.M.; Bansal, Y.; Dapello, J.; Advani, M.; Kolchinsky, A.; Tracey, B.D.; Cox, D.D. On the information bottleneck theory of deep learning. J. Stat. Mech. Theory Exp. 2019, 2019, 35. [Google Scholar] [CrossRef]
- Yu, S.; Príncipe, J.C. Understanding autoencoders with information theoretic concepts. Neural Netw. 2019, 117, 104–123. [Google Scholar] [CrossRef] [Green Version]
- Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining Explanations: An Overview of Interpretability of Machine Learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–3 October 2018; pp. 80–89. [Google Scholar] [CrossRef] [Green Version]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1724–1734. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
- Gunning, D.; Aha, D.W. DARPA’s Explainable Artificial Intelligence Program. AI Mag. 2019, 40, 44–58. [Google Scholar]
- Doshi-Velez, F.; Kim, B. Towards A Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608. [Google Scholar]
- Preece, A.; Harborne, D.; Braines, D.; Tomsett, R.; Chakraborty, S. Stakeholders in Explainable AI. arXiv 2018, arXiv:1810.00184. [Google Scholar]
- Murdoch, W.J.; Singh, C.; Kumbier, K.; Abbasi-Asl, R.; Yu, B. Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. USA 2019, 116, 22071–22080. [Google Scholar] [CrossRef] [Green Version]
- Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
- Lipton, Z.C. The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery. ACM Queue 2018, 16, 31–57. [Google Scholar] [CrossRef]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 623–656. [Google Scholar] [CrossRef]
- Shannon, C.E. Communication Theory of Secrecy Systems. Bell Syst. Tech. J. 1949, 28, 656–715. [Google Scholar] [CrossRef]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: New York, NY, USA, 2006. [Google Scholar]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
- Chollet, F. Keras. 2020. Available online: https://keras.io/about/ (accessed on 17 September 2021).
- Google Brain Team. TensorFlow. Google. Available online: https://github.com/tensorflow/tensorflow/tree/r2.3/tensorflow/python (accessed on 1 June 2020).
- Shi, X.; De-Silva, V. Evaluate CNN with Info Theory. Available online: https://github.com/shiluldn/Evaluate_CNN_with_Info_Theory (accessed on 26 December 2021).
- Wu, Z. Deep Learning-Deterministic Neural Networks, Presented at the MIT Techchnology Review 2013, 23 April 2013. Available online: http://3dvision.princeton.edu/courses/COS598/2014sp/slides/lecture05_cnn/lecture05_cnn.pdf (accessed on 6 September 2020).
Type of Information Quantities | Quantity |
---|---|
Entropy of Image 1, H(X) | 1.5731 |
Entropy of Image 2, H(Y) | 1.5342 |
Mutual Information, I(X;Y) | 1.4191 |
Conditional Entropy, H(X|Y) | 0.1540 |
Conditional Entropy, H(Y|X) | 0.1151 |
Joint Entropy, H(X, Y) | 1.6882 |
Kernel Size | Training Time Per Epoch (ms) | Validation Loss | Validation Accuracy |
---|---|---|---|
3 × 3 | 1026 | 0.0549 | 0.9920 |
5 × 5 | 2029 | 0.0493 | 0.9927 |
7 × 7 | 2031 | 0.0366 | 0.9931 |
9 × 9 | 3047 | 0.0666 | 0.9933 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shi, X.; De-Silva, V.; Aslan, Y.; Ekmekcioglu, E.; Kondoz, A. Evaluating the Learning Procedure of CNNs through a Sequence of Prognostic Tests Utilising Information Theoretical Measures. Entropy 2022, 24, 67. https://doi.org/10.3390/e24010067
Shi X, De-Silva V, Aslan Y, Ekmekcioglu E, Kondoz A. Evaluating the Learning Procedure of CNNs through a Sequence of Prognostic Tests Utilising Information Theoretical Measures. Entropy. 2022; 24(1):67. https://doi.org/10.3390/e24010067
Chicago/Turabian StyleShi, Xiyu, Varuna De-Silva, Yusuf Aslan, Erhan Ekmekcioglu, and Ahmet Kondoz. 2022. "Evaluating the Learning Procedure of CNNs through a Sequence of Prognostic Tests Utilising Information Theoretical Measures" Entropy 24, no. 1: 67. https://doi.org/10.3390/e24010067
APA StyleShi, X., De-Silva, V., Aslan, Y., Ekmekcioglu, E., & Kondoz, A. (2022). Evaluating the Learning Procedure of CNNs through a Sequence of Prognostic Tests Utilising Information Theoretical Measures. Entropy, 24(1), 67. https://doi.org/10.3390/e24010067