Efficient Hyperbolic Perceptron for Image Classification
Abstract
:1. Introduction
- A fully hyperbolic deep neural architecture for image tasks, which is called hyperbolic ResMLP (HR-MLP), is presented to explore the potential of the hyperbolic perceptron to high dimensional data.
- The proposed HR-MLP has a Lorentz cross-patch and cross-channel layer, which is a manifold-preserving neural operator.
- Results on CIFAR10, CIFAR100, and MiniImageNet demonstrate comparable and superior performance with their Euclidean counterpart while having much better interoperability.
2. Related Works
2.1. Image Classification in Euclidean Space
2.2. Hyperbolic Deep Learning
3. Preliminary
3.1. Topological Spaces and Manifold
- The entire space and the empty set must both be open.
- The intersection of any finite number of open sets must be open.
- The union of any number of open sets must be open.
3.2. Isometric Models in Hyperbolic Space
3.2.1. Lorentz Model
3.2.2. Klein Model
3.2.3. Poincaré Model
3.2.4. Poincar é Half Plane Model
3.2.5. Hemisphere Model
4. Methodology
4.1. Lorentz Linear Embedding
4.2. Lorentz Linear Layer
4.3. Lorentz Cross-Channel and Cross-Patch Layers
4.4. Hyperbolic MLP Head
5. Experiment
5.1. Datasets and Metrics
5.2. Implementation
6. Results
7. Discussion
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Ahonen, T.; Hadid, A.; Pietikäinen, M. Face recognition with local binary patterns. In Proceedings of the Computer Vision-ECCV 2004: 8th European Conference on Computer Vision, Prague, Czech Republic, 11–14 May 2004; Proceedings, Part I 8. Springer: Berlin/Heidelberg, Germany, 2004; pp. 469–481. [Google Scholar]
- Lindeberg, T. Scale Invariant Feature Transform. 2012. Available online: http://www.scholarpedia.org/article/Scale_Invariant_Feature_Transform (accessed on 20 September 2023).
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations 2021, Vienna, Austria, 4 May 2021. [Google Scholar]
- Peng, W.; Hong, X.; Chen, H.; Zhao, G. Learning graph convolutional network for skeleton-based human action recognition by neural searching. In Proceedings of the AAAI Conference on Artificial Intelligence 2020, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 2669–2676. [Google Scholar]
- Peng, W.; Hong, X.; Zhao, G. Video action recognition via neural architecture searching. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 11–15. [Google Scholar]
- Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
- Touvron, H.; Bojanowski, P.; Caron, M.; Cord, M.; El-Nouby, A.; Grave, E.; Izacard, G.; Joulin, A.; Synnaeve, G.; Verbeek, J.; et al. Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5314–5321. [Google Scholar]
- Liu, H.; Dai, Z.; So, D.; Le, Q.V. Pay attention to mlps. Adv. Neural Inf. Process. Syst. 2021, 34, 9204–9215. [Google Scholar]
- Melas-Kyriazi, L. Do you even need attention? A stack of feed-forward layers does surprisingly well on imagenet. arXiv 2021, arXiv:2105.02723. [Google Scholar]
- Peng, W.; Shi, J.; Varanka, T.; Zhao, G. Rethinking the ST-GCNs for 3D skeleton-based human action recognition. Neurocomputing 2021, 454, 45–53. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar]
- Peng, W.; Varanka, T.; Mostafa, A.; Shi, H.; Zhao, G. Hyperbolic deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 10023–10044. [Google Scholar]
- Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric deep learning: Going beyond euclidean data. IEEE Signal Process. Mag. 2017, 34, 18–42. [Google Scholar]
- Liu, Q.; Nickel, M.; Kiela, D. Hyperbolic graph neural networks. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Chami, I.; Ying, Z.; Ré, C.; Leskovec, J. Hyperbolic graph convolutional neural networks. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Zhu, Y.; Zhou, D.; Xiao, J.; Jiang, X.; Chen, X.; Liu, Q. HyperText: Endowing FastText with Hyperbolic Geometry. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; Association for Computational Linguistics: Cedarville, OH, USA, 2020; pp. 1166–1171. [Google Scholar] [CrossRef]
- López, F.; Heinzerling, B.; Strube, M. Fine-Grained Entity Typing in Hyperbolic Space. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Florence, Italy, 2 August 2019; Association for Computational Linguistics: Florence, Italy, 2019; pp. 169–180. [Google Scholar] [CrossRef]
- Dhingra, B.; Shallue, C.; Norouzi, M.; Dai, A.; Dahl, G. Embedding Text in Hyperbolic Spaces. In Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12), New Orleans, LA, USA, 6 June 2018; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 59–69. [Google Scholar] [CrossRef]
- Tifrea, A.; Becigneul, G.; Ganea, O.E. Poincaré GloVe: Hyperbolic Word Embeddings. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Nickel, M.; Kiela, D. Poincaré embeddings for learning hierarchical representations. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Peng, W.; Shi, J.; Xia, Z.; Zhao, G. Mix dimension in poincaré geometry for 3d skeleton-based action recognition. In Proceedings of the 28th ACM International Conference on Multimedia 2020, Seattle, WA, USA, 12–16 October 2020; pp. 1432–1440. [Google Scholar]
- Bachmann, G.; Bécigneul, G.; Ganea, O. Constant curvature graph convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR 2020, Virtual, 13–18 July 2020; pp. 486–496. [Google Scholar]
- Khrulkov, V.; Mirvakhabova, L.; Ustinova, E.; Oseledets, I.; Lempitsky, V. Hyperbolic image embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 14–19 June 2020; pp. 6418–6428. [Google Scholar]
- Weber, M.; Zaheer, M.; Rawat, A.S.; Menon, A.K.; Kumar, S. Robust large-margin learning in hyperbolic space. Adv. Neural Inf. Process. Syst. 2020, 33, 17863–17873. [Google Scholar]
- Mathieu, E.; Le Lan, C.; Maddison, C.J.; Tomioka, R.; Teh, Y.W. Continuous hierarchical representations with poincaré variational auto-encoders. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Skopek, O.; Ganea, O.E.; Becigneul, G. Mixed-curvature Variational Autoencoders. In Proceedings of the 8th International Conference on Learning Representations (ICLR) 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Peng, W.; Hong, X.; Xu, Y.; Zhao, G. A boost in revealing subtle facial expressions: A consolidated eulerian framework. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–5. [Google Scholar]
- Yang, J.; Jiang, Y.G.; Hauptmann, A.G.; Ngo, C.W. Evaluating bag-of-visual-words representations in scene classification. In Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval 2007, Bavaria, Germany, 24–29 September 2007; pp. 197–206. [Google Scholar]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulupp, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR 2019, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Mikołajczyk, A.; Grochowski, M. Data augmentation for improving deep learning in image classification problem. In Proceedings of the 2018 International Interdisciplinary PhD Workshop (IIPhDW), Swinoujscie, Poland, 9–12 May 2018; pp. 117–122. [Google Scholar]
- Shaha, M.; Pawar, M. Transfer learning for image classification. In Proceedings of the 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 29–31 March 2018; pp. 656–660. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
- Zoph, B.; Le, Q. Neural Architecture Search with Reinforcement Learning. In Proceedings of the International Conference on Learning Representations 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Chen, W.; Han, X.; Lin, Y.; Zhao, H.; Liu, Z.; Li, P.; Sun, M.; Zhou, J. Fully Hyperbolic Neural Networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Dublin, Ireland, 2022; pp. 5672–5686. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 20 September 2023).
- Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-Ucsd Birds-200-2011 Dataset. 2011. Available online: https://paperswithcode.com/dataset/cub-200-2011 (accessed on 20 September 2023).
- Atigh, M.G.; Schoep, J.; Acar, E.; Van Noord, N.; Mettes, P. Hyperbolic image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4453–4462. [Google Scholar]
- Beltrami, E. Teoria Fondamentale Degli Spazii di Curvatura Costante Memoria. Ann. Mat. 1868, 2, 232–255. [Google Scholar] [CrossRef]
- Cannon, J.W.; Floyd, W.J.; Kenyon, R.; Parry, W.R. Hyperbolic geometry. Flavors Geom. 1997, 31, 2. [Google Scholar]
- van Wyk, G.J.; Bosman, A.S. Evolutionary neural architecture search for image restoration. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar]
- Ganea, O.; Bécigneul, G.; Hofmann, T. Hyperbolic neural networks. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
- Nickel, M.; Kiela, D. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In Proceedings of the International Conference on Machine Learning, PMLR 2018, Vienna, Austria, 10–15 July 2018; pp. 3779–3788. [Google Scholar]
- Moretti, V. The interplay of the polar decomposition theorem and the Lorentz group. arXiv 2002, arXiv:math-ph/0211047. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, PMLR 2021, Virtual Event, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Cai, H.; Gan, C.; Han, S. Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition. arXiv 2022, arXiv:2205.14756. [Google Scholar]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Becigneul, G.; Ganea, O.E. Riemannian Adaptive Optimization Methods. In Proceedings of the International Conference on Learning Representations 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Methods | Parameters (M) | Flops (M) | Dataset | ||
---|---|---|---|---|---|
CIFAR10 | CIFAR100 | MiniImageNet | |||
CNN | 2.112 M | 10 M | 61.86% | 29.59% | 20.15% |
MLP mixer | 1.005 M | 13.3 M | 59.01% | 27.01% | 30.30% |
ResMLP-S12 | 0.685 M | 81 M | 62.77% | 35.29% | 36.83% |
ViT | 5.912 M | 89.2 M | 59.2% | 32.23% | 21.67% |
EfficientViT-B0 | 2.142 M | 10 M | 75.04% | 46.05% | 34.43% |
HR-MLP (Ours) | 1.038 M | 219 M | 76.44% | 48.55% | 37.47% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ahsan, A.O.; Tang, S.; Peng, W. Efficient Hyperbolic Perceptron for Image Classification. Electronics 2023, 12, 4027. https://doi.org/10.3390/electronics12194027
Ahsan AO, Tang S, Peng W. Efficient Hyperbolic Perceptron for Image Classification. Electronics. 2023; 12(19):4027. https://doi.org/10.3390/electronics12194027
Chicago/Turabian StyleAhsan, Ahmad Omar, Susanna Tang, and Wei Peng. 2023. "Efficient Hyperbolic Perceptron for Image Classification" Electronics 12, no. 19: 4027. https://doi.org/10.3390/electronics12194027
APA StyleAhsan, A. O., Tang, S., & Peng, W. (2023). Efficient Hyperbolic Perceptron for Image Classification. Electronics, 12(19), 4027. https://doi.org/10.3390/electronics12194027