Efficient Hyperbolic Perceptron for Image Classification

Ahsan, Ahmad Omar; Tang, Susanna; Peng, Wei

doi:10.3390/electronics12194027

Open AccessArticle

Efficient Hyperbolic Perceptron for Image Classification

by

Ahmad Omar Ahsan

¹

,

Susanna Tang

² and

Wei Peng

^3,*

¹

Department of Biomedical Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada

²

Fatima Fellowship, USA

³

Stanford Medicine, Stanford University, Stanford, CA 94305, USA

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(19), 4027; https://doi.org/10.3390/electronics12194027

Submission received: 30 July 2023 / Revised: 20 September 2023 / Accepted: 21 September 2023 / Published: 25 September 2023

(This article belongs to the Collection Graph Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Deep neural networks, often equipped with powerful auto-optimization tools, find widespread use in diverse domains like NLP and computer vision. However, traditional neural architectures come with specific inductive biases, designed to reduce parameter search space, cut computational costs, or introduce domain expertise into the network design. In contrast, multilayer perceptrons (MLPs) offer greater freedom and lower inductive bias than convolutional neural networks (CNNs), making them versatile for learning complex patterns. Despite their flexibility, most neural architectures operate in a flat Euclidean space, which may not be optimal for various data types, particularly those with hierarchical correlations. In this paper, we move one step further by introducing the hyperbolic Res-MLP (HR-MLP), an architecture extending the attention-free MLP to a non-Euclidean space. HR-MLP leverages fully hyperbolic layers for feature embeddings and end-to-end image classification. Our novel Lorentz cross-patch and cross-channel layers enable direct hyperbolic operations with fewer parameters, facilitating faster training and superior performance compared to Euclidean counterparts. Experimental results on CIFAR10, CIFAR100, and MiniImageNet confirm HR-MLP’s competitive and improved performance.

Keywords:

deep neural networks; hyperbolic geometry; image classification; Lorentz model; graphs

1. Introduction

In recent years, significant advancements have been made in the development of neural network architectures with reduced hard-coded priors and inductive biases. Traditional approaches, such as manually designing fixed features, e.g., SIFT and LBP [1,2,3], have been replaced by automatic feature-learning strategies powered with more flexible architectures like convolutional neural networks (CNNs) [4]. Very recently, vision transformers (ViTs) [5] have further pushed the boundaries by eliminating hard-coded decisions, like translation invariance and local connectivity commonly found in CNNs. However, such models always have huge computational complexity, are extremely data-hungry, and lack interpretability as the complex interactions and attention patterns within the self-attention mechanism. At the same time, such models require higher expert knowledge for adaptively designing neural architectures [6,7] for different applications. All of these, coupled with improved training schemes, have facilitated the reemergence of purely multi-layer perceptron (MLP) architectures in computer vision tasks [8,9,10,11,12]. Previously, MLPs were largely overlooked or disregarded in computer vision due to their computationally intensive nature. However, recent developments [8,9] have made it feasible to harness the potential of MLPs while achieving promising trade-offs between accuracy and design complexity in image classification tasks; for instance, MLPs [8,9] have obtained very promising results when trained and evaluated on the large-scale dataset, ImageNet [13].

However, it is worth noting that these neural network architectures, including MLPs, predominantly operate within Euclidean space. Euclidean space represents the conventional generalization of our intuitive three-dimensional space but may not be the most suitable representation for all types of data [14]. For instance, complex datasets, such as social networks, human skeletons, sentences, and evolutionary relationships, often exhibit hierarchical structures, leading to big distortion when embedding such data in Euclidean space. Recognizing the limitations of Euclidean representation learning, researchers have explored alternative approaches that leverage hyperbolic spaces, which provide a natural fit for capturing the non-Euclidean nature of such data.

Hyperbolic deep learning [14], based on the negative curvature of hyperbolic spaces [15], has been successfully applied to various domains, including natural language processing (NLP) and graph analysis [16,17]. Hyperbolic embeddings have been employed for tasks such as text classification [18], entity typing [19], word embeddings [20,21,22], and some areas in graphs, such as node classification [17], graph classification [16,23], link prediction [17], and graph embeddings [24]. While it may not be immediately apparent that images possess a hierarchical data structure, several studies [23,25,26,27,28] have demonstrated the effective utilization of hyperbolic spaces for learning image embeddings.

Images, by their nature, often possess inherent hierarchical structures. For example, in an image of a person, there are hierarchies ranging from pixels to body parts (e.g., limbs, face), and finally, to the whole person. These hierarchical relationships are not easily modeled in a flat Euclidean space, which treats all distances equally. When using Euclidean geometry, important contextual information and relative hierarchical importance may be lost, making it challenging to recognize and understand complex image features. Prior research, such as by Khrulkov et al. (2020) [25], has compellingly demonstrated the presence of distinct hierarchical relationships within image features learned by widely adopted neural networks, like VGG. Simultaneously, it is worth noting that hierarchy and tree structures constitute common paradigms for human cognition in comprehending and recognizing the world. However, due to the challenges of extending hyperbolic neural networks to computer vision, most existing works [23,25] related to hyperbolic spaces in the context of computer vision rely on a hybrid framework, in which the Euclidean backbone is first applied to generate embeddings and then a hyperbolic layer is introduced to perform the final prediction. While this approach has shown promise, it still operates within a hard-to-train hybrid setting. Therefore, fully hyperbolic networks that leverage hyperbolic operations and layers to learn feature embeddings need to be explored in the domain.

Therefore, this research is motivated to bridge the gap, as we endeavor to address this deficiency by developing a fully hyperbolic neural network architecture without the usage of the tangent space, specifically tailored for computer vision tasks. By embracing the intrinsic hierarchical structures within images and harnessing the power of hyperbolic spaces, we expect that our proposed architecture will offer novel insights and potentially surpass traditional Euclidean-based models in certain scenarios. The contributions of this paper can be summarized as follows:

A fully hyperbolic deep neural architecture for image tasks, which is called hyperbolic ResMLP (HR-MLP), is presented to explore the potential of the hyperbolic perceptron to high dimensional data.
The proposed HR-MLP has a Lorentz cross-patch and cross-channel layer, which is a manifold-preserving neural operator.
Results on CIFAR10, CIFAR100, and MiniImageNet demonstrate comparable and superior performance with their Euclidean counterpart while having much better interoperability.

2. Related Works

2.1. Image Classification in Euclidean Space

Image classification is the process of categorizing and labeling images, which are commonly represented by groups of pixels or vectors. In terms of the availability of the class label during training, there are generally supervised and unsupervised image classifications [29]. Image classification models, which can be various machine learning methods/architectures, take an image as input and return a prediction about which class the image belongs to.

Before the advent of deep neural networks, especially convolutional neural networks (CNNs), the traditional methods for image classification were heavily based on hand-crafted features, like the histogram of oriented gradients (HOGs), scale-invariant feature transform (SIFT), local binary patterns (LBPs) [1,2,3,30], or bag-of-visual-words (BoVWs) [31], to represent the input images. After that the feature will be further fed into a classifier, which can be a support vector machine (SVM) [32] or random forest [33], to predict the predefined labels. While traditional methods have been widely used for image classification tasks, they often face limitations in capturing the rich and hierarchical representations that cannot handle large-scale and complex datasets well. CNNs and other deep learning models have largely surpassed traditional methods in terms of accuracy and generalization, especially when applied to large-scale datasets like ImageNet [13]. At the deep learning age, numerous neural architectures and techniques are proposed to automatically learn feature representations from a dataset, and in most cases, feature learning and class prediction are combined in an end-to-end fashion. In terms of neural architectures, CNNs [4] are now some of the most commonly used networks for images, such as residual networks (ResNets) [34] (which introduce skip connections to alleviate the vanishing gradient problem), inception networks [35] (which utilize multiple parallel convolutional layers of different sizes to capture features at various scales, promoting the extraction of both fine-grained and high-level contextual information), DenseNet [36], squeeze-and-excitation networks (SENets) [37], and EfficientNet [38] (which proposes a compound scaling method to strike an optimal balance between model size and performance).

Recently, to further improve the representation ability of the neural network, self-attention mechanisms were introduced. Dosovitskiy et al. [5] proposed an attention-based method, termed vision transformers (ViTs). The models process images in a patch-based manner (in which the image is treated as a sequence of tokens) and apply the self-attention mechanism to capture global correlations. This approach has achieved remarkable results in various image classification benchmarks. Unlike this, residual multi-layer perceptrons (ResMLPs) [9] remove self-attention and are expected to provide a much more efficient way to learn rich features. In particular, they combine the strengths of MLPs and residual connections and give more flexibility to the learning process. These models achieve competitive performance in image classification tasks while maintaining the simplicity and interpretability of MLPs.

Parallel to the design of the neural architectures, new techniques are also presented to improve the task. For instance, data augmentation techniques [39], including different transformations, are commonly used to augment the training data and improve the generalization ability of image classifiers. Transfer learning techniques [40], such as pre-trained models, like VGG [41], ResNet [34], and InceptionNet [35], trained on large-scale datasets like ImageNet [13], have been widely adopted for image classification. By leveraging features learned from these models, transfer learning enables effective classification even with limited labeled data. Since designing neural architectures requires high expert knowledge, both for deep neural networks with non-trivial optimizations and knowledge for a specific domain (e.g., segmentation, or different modalities), the neural architecture search (NAS) [42] was introduced to automate the search for optimal network architectures for various datasets. NAS [42] has been utilized to discover novel architectures that achieve state-of-the-art performance in image classification tasks.

It is worth noting that all of the methods here are in Euclidean space. These works highlight the extensive research efforts and advancements in image classification within Euclidean space, leading to the development of highly effective and efficient neural network models for a wide range of applications. However, to give better interpretability, many attempts were made to construct neural networks in non-Euclidean space [23,25], while just turning to its tangent space, which has a huge space to improve.

2.2. Hyperbolic Deep Learning

Hyperbolic deep learning [14] is a new research field of deep learning, which aims to learn compact and rich feature representation, utilizing hyperbolic spaces as the underlying mathematical structure for representing and processing data. Unlike traditional deep learning methods that primarily operate in Euclidean spaces, hyperbolic deep learning leverages the unique properties of hyperbolic geometries to handle data with hierarchical structures, non-Euclidean relationships [16], and complex interconnectedness [14,17]. Hyperbolic deep learning has emerged as a promising paradigm in various domains, including graph analysis [16,17,23,43], natural language processing (NLP) [18,19,20,21], and other complex data structures. Complex hierarchical relationships are prevalent in social networks, biological networks, and other graph data [14]. By leveraging the negative curvature of hyperbolic spaces, in graph analysis, hyperbolic deep learning has shown significant success in tasks such as node classification [17], graph classification [16,23], and link prediction [17], demonstrating significant potential to its Euclidean counterparts. For, instance, in the field of NLP, hyperbolic embeddings could provide extremely compact embedding (even two-dimensional) [22], which has demonstrated its efficacy in tasks like text classification, entity typing, word embeddings, and sentence representations.

In recent years, hyperbolic embeddings have garnered attention in computer vision due to their ability to capture hierarchical relationships among data points. Several works have explored the application of hyperbolic learning in classical computer vision tasks.

One of the early attempts at utilizing hyperbolic embeddings for computer vision tasks comes from Khrulkov et al. [25]. They argue that hierarchical relations between images are common in computer vision tasks, such as image retrieval and classification. In their work, they showed that feature embeddings of popular architectures, including ResNet [34], VGG19 [41], and InceptionV3 [35], exhibit hyperbolicity on various datasets like CIFAR10, CIFAR100 [44], CUB [45], and MiniImageNet. However, their approach utilized hyperbolic layers along with Euclidean backbones, thus only partially leveraging the full potential of hyperbolic geometry.

Similarly, in [23], a Poincaré ST-GCN was introduced for skeleton-based action recognition. They modeled the input sequence (skeletons) as a graph in hyperbolic space, demonstrating the benefits of hyperbolic geometry for capturing spatial relationships. Nevertheless, their work mainly focused on applying hyperbolic transformations to specific components of the architecture without building a complete hyperbolic neural network.

Additionally, work [46] proposed hyperbolic manifolds as an alternative for image segmentation, enabling pixel-level classification with hierarchical formulation. Their work showed how hyperbolic spaces offer natural uncertainty estimation measures and improved zero-label generalization compared to Euclidean counterparts. However, similar to the previous works, they did not fully explore the potential of building a full hyperbolic neural network for computer vision tasks.

While the mentioned works have made significant strides in applying hyperbolic learning to computer vision, they have primarily used hyperbolic layers or transformations in conjunction with traditional Euclidean networks, limiting the realization of the full benefits of hyperbolic geometry. As a result, it is very valuable to develop a complete hyperbolic neural network for computer vision tasks. Building such a full hyperbolic neural network has the potential to enhance feature representations, improve generalization, and enable more efficient computations, opening up new avenues for computer vision research.

3. Preliminary

Hyperbolic geometry [14], or the Lobachevsky–Bolyai–Gauss geometry, is a non-Euclidean geometry having a constant negative sectional curvature. This geometry satisfies all of Euclid’s five postulates, except the last parallel postulate, as illustrated in Figure 1.

Unlike Euclidean geometry, where the sum of the angles in a triangle is always 180 degrees, in hyperbolic geometry, the sum of the angles in a triangle is less than 180 degrees. This property is a consequence of the negative curvature of the hyperbolic plane. Distances in hyperbolic space grow exponentially as one moves away from a fixed point. As a result, objects in hyperbolic space appear to expand rapidly as they move away from an observer. The distance between two points in hyperbolic space is measured along the unique geodesic (the hyperbolic equivalent of a straight line) connecting them. Hyperbolic distance is often used to define similarity measures in hyperbolic embeddings. Hyperbolic geometry has found applications in various scientific fields, including physics, cosmology, computer graphics, and machine learning. In machine learning, hyperbolic geometry has been applied to design novel deep learning models, particularly in hyperbolic deep learning, to handle data with hierarchical or tree-like structures. By utilizing hyperbolic spaces, current studies aim to improve the representation and processing of complex data, such as graphs, hierarchical text data, and relational data, where traditional Euclidean spaces may not be the most suitable choice. Hyperbolic geometry provides a powerful and elegant mathematical framework to understand and analyze such intricate structures in a more efficient and expressive manner.

3.1. Topological Spaces and Manifold

A topological space [14] is a set equipped with a collection of open sets that satisfy certain properties. Open sets are subsets of the space that are considered “open” in the sense that they contain a neighborhood around each of their points. These open sets must fulfill three conditions:

The entire space and the empty set must both be open.
The intersection of any finite number of open sets must be open.
The union of any number of open sets must be open.

Based on this we provide the definition of a manifold. A d-dimensional manifold

M_{d}

(which can be embedded in

R^{d + 1}

) is a topological space that can be locally approximated by a d-dimensional Euclidean space

R^{d}

. For any point

x \in M_{d}

, there is a homeomorphism between the neighborhood of x and Euclidean space

R^{d}

. Lines and circles are examples of one-dimensional manifolds. Planes and spheres are examples of two-dimensional manifolds, which are called surfaces. The notion of the manifold is a generalization of surfaces in any dimension d. The tangent space

T_{x} M_{d}

at point

x \in M_{d}

is a d-dimensional hyperplane, which is embedded in

R^{d + 1}

and locally approximates the manifold

M_{d}

around the point x.

3.2. Isometric Models in Hyperbolic Space

Hyperbolic space is a homogeneous space with constant negative curvature. It is a smooth Riemannian manifold, and as such, a locally Euclidean space. The hyperbolic space can be modeled using the commonly used five isometric models [47,48], which are the Lorentz (hyperboloid) model, the Poincaré ball model, the Poincaré half-space model, the Klein model, and the hemisphere model. In the following, we will detail these models. Note that we describe the model by fixing the radius of the model to 1 for clarity, without loss of generality.

3.2.1. Lorentz Model

The Lorentz model

L^{n}

of an n-dimensional hyperbolic space is a manifold embedded in the

n + 1

dimensional Minkowski space. The Lorentz model is defined as the upper sheet of a two-sheeted n-dimensional hyperbola with the metric

g^{L}

, which is

L^{n} = {x = (x^{0}, . . ., x^{n}) \in R^{n + 1} : {〈 x, x 〉}_{L} = - 1, x^{0} > 0},

(1)

in which the

{〈, 〉}_{L}

represents the Lorentzian inner product:

{〈 x, y 〉}_{L} = x^{T} g^{L} y = - x^{0} y^{0} + \sum_{i = 1}^{n} x^{i} y^{i}, x and y \in R^{n + 1},

(2)

where

g^{L}

is a diagonal matrix with entries of 1s, except for the first element being −1. For any

x \in L^{n}

, we can obtain that

x^{0} = \sqrt{1 + \sum_{i = 1}^{n} {(x^{i})}^{2}}

. The distance in the Lorentz Model is defined as

d (x, y) = arcosh (- {〈 x, y 〉}_{L}) .

(3)

The main advantage of this parameterization model is that it provides an efficient space for Riemannian optimization. An additional advantage is that its distance function avoids numerical instability when compared to the Poincaré model, where the instability arises from the fraction.

3.2.2. Klein Model

The Klein model is also known as the Beltrami–Klein model, named after the Italian mathematician Eugenio Beltrami and German mathematician Felix Klein. The Klein model of hyperbolic space is a subset of

R^{n}

, as illustrated in Figure 2. It is the isometric image of the Lorentz model under the stereographic projection [48]. The Klein model is obtained by mapping

x \in L^{n + 1}

to the hyperplane

x^{0} = 1

, using rays emanating from the origin. Formally, the Klein model is defined as

K^{n} = {x \in R^{n} : | | x | | < 1} .

(4)

The distance is

d (x, y) = arcosh (1 + \frac{1 - 〈 x, y 〉}{\sqrt{{(1 - | | x | |}^{2}) (1 - {| | y | |}^{2})}}) .

(5)

A straight line in the Klein model, e.g., line

\bar{A B}

in the second figure from the left of Figure 2, is an intersection of a plane with the disk; thus, it is still straight, like in Euclidean space. Therefore, the Klein model is commonly used to compute the middle point. This model does not conform to the Euclidean model, which means that angles and circles are distorted.

3.2.3. Poincaré Model

The Poincaré model, as shown in Figure 2, is given by projecting each point of

L^{n}

onto the hyperplane

x^{0} = 0

, using the rays emanating from (−1, 0, …, 0). The Poincaré model

B

is manifold-equipped with a Riemannian metric

g^{B}

. This metric is conformal to the Euclidean metric

g^{E} = I^{n}

with the conformal factor

λ_{x} = \frac{2}{1 - {| | x | |}^{2}}

, and

g^{B} = λ_{x}^{2} g^{E}

. Formally, an n-dimensional Poincaré unit ball (manifold) is defined as

B^{n} = {x \in R^{n} : | | x | | < 1},

(6)

where

| | \cdot | |

denotes the Euclidean norm. The distance between

x, y \in B^{n}

is defined as:

d (x, y) = arcosh (1 + 2 \frac{| | x - {y | |}^{2}}{{(1 - | | x | |}^{2}) (1 - {| | y | |}^{2})}) .

(7)

3.2.4. Poincar é Half Plane Model

The closely related Poincaré half-plane model in hyperbolic space is a Riemannian manifold

(H^{n}, g^{H})

, where

H^{n} = {x \in R^{n} : x_{n} > 0}

(8)

is the upper half space of an n-dimensional Euclidean space. And the metric

g^{H}

is given by scaling the Euclidean metric

g^{H} = \frac{g^{E}}{x_{n}^{2}}

. The model

H^{n}

can be obtained by taking the inverse of the Poincaré model,

B^{n}

, with respect to a circle that has a radius twice that of

B^{n}

. The distance is

d (x, y) = arcosh (1 + \frac{| | x - {y | |}^{2}}{2 x_{n} y_{n}}) .

(9)

3.2.5. Hemisphere Model

The hemisphere model is also called the hemisphere model, which is not as common as the previous four models. Instead, this model is employed as a useful tool for visualizing transformations between other models. The hemisphere model is defined as

J^{n} = {x = (x_{0}, . . ., x_{n}) \in R^{n + 1} : | | x | | = 1, x_{0} > 0},

(10)

The five isometric models [47,48] are embedded sub-manifolds of ambient real vector spaces. In fact, these five models are equivalent models of hyperbolic space. There are closed-form expressions for mapping between these hyperbolic models. As illustrated in Figure 3, we display their model in a two-dimensional space and demonstrate their relationship.

4. Methodology

In this section, we will describe our method, the efficient hyperbolic residual MLP (HR-MLP). The entire neural architecture is illustrated in Figure 4. Initially, an image is taken as an input. It is divided into patches, just like ViT, ResMLP, g-MLP, and the MLP mixer [5,8,9,10], and then passed through a Euclidean linear layer to create patch embeddings. Then the Euclidean embeddings are converted to the hyperbolic domain via exponential mapping, particularly using the Lorentz model, considering the representation ability and the optimization trade-off. The hyperbolic domains are passed to a block, dubbed the HR block, which contains Lorentz linear layers for linear transformations in the hyperbolic domain. Finally, after passing through the block

N \times

, the embeddings are then pooled using adaptive average pooling [49]. Like HyboNet [43], we use a hyperbolic MLP head to perform classification for the feature learned from the Lorentz model. In the following part, we will detail each neural component.

4.1. Lorentz Linear Embedding

Given an image

x \in R^{H \times W \times D}

, we divide the image into overlapping patches [50] by using convolution with zero padding, which is our patch embedding layer. Specifically, an image

x \in R^{H \times W \times D}

is fed to a convolution with stride S, a kernel size of

2 S - 1

, a padding size of

S - 1

, and output channels

C^{'}

. The output size is

\frac{H}{S} \times \frac{W}{S} \times C^{'}

. We rearrange the output shape to

P \times C^{'}

, where P is the number of patches and is calculated by

\frac{H}{S} \times \frac{W}{S}

. Let F be the feature learned from the patch embedding. As this feature, F, is still in Euclidean space, we need to lift the feature, such that it obtains the corresponding feature in the Lorentz manifold. To this end, the exponential map function is applied, which is

{Exp}_{0} (F) = {\cos h (| | F | |}_{L} {) + \sin h (| | F | |}_{L}) \frac{F}{{| | F | |}_{L}},

(11)

where we assume that the feature is at a tangent space of a feature point 0, in which we choose the origin of the Lorentz model. The feature representation F is then mapped to the Lorentz manifold. In this way, we obtain the feature representation that lies in the Lorentz manifold.

4.2. Lorentz Linear Layer

Basic operations, like addition, multiplication, and pooling, construct linear layers. They are easy and efficient to be applied in Euclidean space. The features F are now converted into hyperbolic space where Euclidean operations cannot be applied, as most are not manifold-preserving. Thus, currently, we need to provide Lorentz operations that are able to keep the learned feature on the manifold after the transformation.

An alternative route is to transfer each point in hyperbolic space to the tangent space, where the tangent space at that point is the Euclidean subspace. Therefore we can use Euclidean neural operations in this tangent subspace. Existing works [51,52], formalize most Euclidean operations for hyperbolic networks by transforming features between hyperbolic spaces and tangent spaces via logarithmic and exponential maps, such that neural network operations can directly be performed in the tangent spaces. However, tangent spaces are not the optimal choice for several reasons. First, the composition of these functions is complicated as features need to continuously move back and forth between two spaces for each layer. Second, such a transformation also leads to values ranging to infinity, which significantly reduces the stability of the model. Finally, the tangent space is a linear approximation of hyperbolic space, which would not make full use of the advantages of non-Euclidean space.

To avoid such a transformation between hyperbolic and tangent spaces, as inspired by Chen et al. [43], we proposed the Lorentz (hyperboloid) linear layer (

HL (\cdot)

) fully in hyperbolic space by formalizing Lorentz manifold-preserving operations for neural networks without using tangent spaces. Taking inspiration from the theory of special relativity, which uses the Minkowski space (Lorentz model), our framework selects the Lorentz model as the feature space. The operations are formalized via the relaxation of the Lorentz transformations to build hyperbolic neural networks, including the linear layer, attention layer, etc. Additionally, from Chen et al. [43], we know that performing a linear transformation in the tangent space at the origin of hyperbolic space [51,52] is equivalent to performing a Lorentz rotation with relaxed restrictions. Therefore, by using this Lorentz rotation, we can build a Lorentz linear layer in a fully hyperbolic space, which is faster, more stable, and can achieve comparable or even better results than previous methods.

Here, we show how to build the Lorentz linear layer. Simply, as in Figure 5, the linear layer can be a mapping matrix

f \in R^{(m + 1) \times (n + 1)}

, which can project the feature

F \in R^{n + 1}

from the

n + 1

dimension to

m + 1

. As for the feature representation on the manifold, we can also decouple the linear operation into Lorentz boost and rotation parts, which are

v \in R^{n + 1}, W \in R^{m \times (n + 1)}

. In this way, we can only let the operation perform on the rotation part, while letting the extra boost part work as a regularizer so that the entire features are always on the Lorentz manifold. The general formula is as follows:

y = HL (F) = [\begin{matrix} \sqrt{{∥ ϕ (W F, v) ∥}^{2} - 1 / K} \\ ϕ (W F, v) \end{matrix}]

(12)

where

F \in L_{K}^{n}

is the feature input from the previous layer,

v \in R^{n + 1}, W \in R^{m \times (n + 1)}

are the linear operations mentioned above, and

ϕ

denotes the other operation functions, e.g., dropout or activation function in the linear layer. This formula is derived from the Lorentz boost and Lorentz rotation, which are polar decompositions of Lorentz transformation [53]. One can easily verify that the norm of the feature is fixed and on the manifold. Compared to hyperbolic linear layers in Ganea et al. [51] and Nickel and Kiela [52], this linear layer is far more expressive, efficient, and stable as it does not use complicated logarithmic and exponential maps. The authors dubbed their transformation as the pseudo-Lorentz rotation as it was a relaxation of the Lorentz transformation.

4.3. Lorentz Cross-Channel and Cross-Patch Layers

Here we introduce how to construct the attention-free HR block, as illustrated in Figure 6. Attention-based methods often involve cost computations to calculate the correlation between long-range positions. However, MLPs try to avoid such a burden while introducing a much more efficient way to perform information aggregation. Of which, the cross-channel and cross-patch layers are introduced as the fundamental components of MLP and its variants [9].

The cross-channel and cross-patch layers are two function models, which are all-MLP architectures that use linear layers to perform feature information updating. They consist of multiple layers of identical size (which means they will not change the feature resolutions). The first one is also called the channel-mixing MLP, which acts on rows (channels) of the feature and tries to exchange the feature information from different patches. This model is shared across all rows. As a comparison, the second MLP cross-patch layer is also called the token-mixing MLP. This layer acts on the columns of each feature and exchanges information across different channels. By combining these two layers, our method offers a significantly more efficient way to update features compared to attention-based methods.

Our Lorentz cross-channel layer uses the Lorentz linear layer to perform transformations on all patches independently, where the architecture is very similar. The only difference is that the Lorentz linear layer (

H L (\cdot)

) is followed to force the output features to be still on the manifold. Likewise, the Lorentz cross-patch layer is extended from its Euclidean counterpart, aiming to perform transformations on all channels independently. There is also a

H L (\cdot)

function to make sure the module is manifold-preserving.

4.4. Hyperbolic MLP Head

Once the feature is learned on the manifold, we will need to make the class prediction based on the feature representation of each sample, from the logits. However, as such features lie in non-Euclidean space, we cannot apply the Euclidean MLR to compute the logits of each category. Therefore, we build a hyperbolic MLP head to make the class predictions based on the Lorentz feature. In particular, we build a learnable classification hyperplane for each class. Then the Lorentz distances to each hyperplane are computed based on Equation (3). Once we have the distances, we can make the prediction based on the shortest distance.

Compared with the vision transformer and its variants, our method is much more efficient as the replacement of the self-attention sublayer by a linear layer with GELU nonlinearity. This also stabilizes the training as it eliminates batch-specific or cross-channel normalization, e.g., BatchNorm, GroupNorm, or LayerNorm, following the same training scheme as in DeiT [54].

5. Experiment

We train our HR-MLP model on CIFAR10, CIFAR100, and MiniImageNet for comparison. All the models were trained from scratch; in the following subsection, we describe the hyperparameters we used to train on the dataset.

5.1. Datasets and Metrics

The proposed method is evaluated on three publicly available datasets, i.e., CIFAR10 [44], CIFAR100 [44], and MiniImageNet [55]. CIFAR10 and CIFAR100 are commonly used in image classification tasks. As indicated by the names, there are 10 classes and 100 classes for them. For MiniImageNet, the dataset consists of 60,000 color images (RGB) of size

84 \times 84

with 100 classes, of which, each class has 600 examples. This dataset is more complex than CIFAR10 and CIFAR100, but fits in the memory of modern machines, making it very convenient for rapid prototyping and experimentation.

We report the accuracy of the testing part of each dataset. We compare it with state-of-the-art methods, including, CNN, MLP mixer [8], ResMLP [9], Vision Transformer [5], and, EfficientVIT [56], in a compact setting (the number of parameters is less than 1 million). Even though the neural architectures and embedding spaces are different, we make a fair comparison by constructing neural architectures with similar parameters.

5.2. Implementation

For Cifar10 and Cifar100, we used a batch size of 32 in all of our experiments. We used the cross-entropy loss function with a label smoothing value of 0.1. All the models were trained on the Tesla P100 16 GB GPU available on Kaggle. All the models used the GELU activation function [57] for nonlinearity. The HR-MLP model used Riemannian SGD [58] with a learning rate value of 0.005 and weight decay of 0.05. All the Euclidean models, such as the ResMLP-S12, ViT, EfficientViT, the MLP mixer, and the CNN model, used the Adamw [59] optimizer with a learning rate value of 0.005 and a weight decay value of 0.05. All models were trained for a total of 150 epochs with a 20-epoch warm-up and a cosine annealing scheduler starting from epoch 40. The models were trained without augmentation, except for converting the images to tensors and normalizing the tensors. The model was evaluated based on the accuracy of the dataset’s test set. For the MiniImageNet, we maintained the same hyperparameters but employed a batch size of 256 and utilized a Tesla V100 GPU. Our code is available at (https://github.com/Ahmad-Omar-Ahsan/HR-MLP).

6. Results

We compared different methods, including traditional CNN neural architecture and current advanced MLP neural architectures. As listed in Table 1, we can see a clear advantage of the proposed method when compared with the other methods. The accuracy of our method is significantly higher than any other given method. For instance, when evaluating the CIFAR10 dataset, our method can be at least

13 %

higher than previous state-of-the-art methods, in the compact setting with similar parameters. This superiority could be even higher when compared with traditional architectures, like CNN. Moreover, with around 1M parameters, our method achieves 48.55% on the CIFAR100 dataset, while CNN only achieves 29.59%. On the more challenging dataset, MiniImageNet, our method achieves 37.47%, which is 17.3% higher than the CNN method and better than the previous state-of-the-art method, ResMLP, which clearly demonstrates the effectiveness of the proposed method using hyperbolic space. Further, our method can also be better than vision transformer models, like ViT and EfficientViT, with only half of the parameters.

To summarize, in this compact setting (which is valuable for local applications with limited resources), our method shows a clear superiority on different public datasets when compared with the baseline methods in Euclidean space. This provides evidence for the potential of learning image embedding in hyperbolic space for downstream tasks.

Here, we also show the feature distribution using t-SNE [60] to further demonstrate the advantage of the proposed hyperbolic neural network. As illustrated in Figure 7, even though the models are very small, we can still observe clear clusters for different categories. However, its Euclidean counterpart appears to struggle to distinguish patterns with such limited computational resources. This further substantiates the effectiveness of the proposed method.

7. Discussion

The strength of hyperbolic geometry lies in its inherent nonlinearity, which endows neural networks with the potential to capture richer information when compared to their Euclidean counterparts of similar architecture. This nonlinearity results in a distinctly different distance metric within hyperbolic space compared to the flat Euclidean space. Consequently, neural networks operating in hyperbolic space are compelled to learn more intricate and fine-grained feature representations, as what may be considered as ’near’ features in Euclidean space can be considerably distant in hyperbolic space.

However, it is essential to note that this nonlinearity, while advantageous, also brings about certain limitations. For instance, it increases the computational demands of the model. Unlike in Euclidean space, where networks can often be decoupled into linear layers and non-linear activations, hyperbolic networks require a more intertwined and challenging learning process. Additionally, there is a lack of definitive evidence supporting the superiority of hyperbolic neural networks when there are no parameter constraints. In other words, it remains unproven whether larger hyperbolic models consistently outperform their Euclidean counterparts, raising questions about the scalability and practicality of hyperbolic models in certain scenarios. Thus, valuable future research will include the scalability of hyperbolic models and the efficiency of hyperbolic neural networks.

8. Conclusions

This paper proposes a new framework for image classification in hyperbolic space. Inspired by the success of graph learning using hyperbolic graph neural networks, we model the underlying hierarchical relationships using hyperbolic geometry. Our approach combines the success of the current recovery of MLP and the exponential growth of hyperbolic space, to address the research gap surrounding the utilization of hyperbolic neural networks for computer vision tasks. The proposed method constructs ResMLP (including cross-channel and cross-patch layers) in hyperbolic space, with neural operations fully performed in hyperbolic space, without the need for back-and-forth Euclidean/non-Euclidean space transformations. Our proposed methods offer several advantages for real-world image classifications with hierarchies. They provide feature embeddings with less distortion. Additionally, the models are generally much more compact than their Euclidean counterparts. Moreover, such hyperbolic neural architectures provide much better interpretability. The experimental results show that our methods achieve better performance on different datasets than previous methods in this compact setting. By embracing the hierarchical structures within images and leveraging the power of hyperbolic spaces, our work offers novel insights and presents a promising direction for advancing computer vision techniques.

However, one limitation of our hyperbolic method is that it focuses on compact models that provide better interpretability, neglecting that there are unlimited computational resources. It would be better if one explores bigger hyperbolic neural models and compares their Euclidean counterparts. Despite this limitation, our approach has proven to be effective in practice, especially when resources are limited. In the future, we plan to explore larger neural network architectures, potentially as deep/extensive as architectures like ViT. Additionally, we will aim to apply our method to various datasets and investigate strategies to improve the classification model as the diversity of training datasets increases.

Author Contributions

Methodology and discussion, A.O.A., S.T. and W.P.; software, A.O.A.; writing—original draft, A.O.A.; writing—review and editing, A.O.A., S.T. and W.P.; supervision, W.P.; formal analysis, A.O.A. and W.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. Thanks computational resources from Fatima Fellowship.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar]
Ahonen, T.; Hadid, A.; Pietikäinen, M. Face recognition with local binary patterns. In Proceedings of the Computer Vision-ECCV 2004: 8th European Conference on Computer Vision, Prague, Czech Republic, 11–14 May 2004; Proceedings, Part I 8. Springer: Berlin/Heidelberg, Germany, 2004; pp. 469–481. [Google Scholar]
Lindeberg, T. Scale Invariant Feature Transform. 2012. Available online: http://www.scholarpedia.org/article/Scale_Invariant_Feature_Transform (accessed on 20 September 2023).
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations 2021, Vienna, Austria, 4 May 2021. [Google Scholar]
Peng, W.; Hong, X.; Chen, H.; Zhao, G. Learning graph convolutional network for skeleton-based human action recognition by neural searching. In Proceedings of the AAAI Conference on Artificial Intelligence 2020, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 2669–2676. [Google Scholar]
Peng, W.; Hong, X.; Zhao, G. Video action recognition via neural architecture searching. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 11–15. [Google Scholar]
Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
Touvron, H.; Bojanowski, P.; Caron, M.; Cord, M.; El-Nouby, A.; Grave, E.; Izacard, G.; Joulin, A.; Synnaeve, G.; Verbeek, J.; et al. Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5314–5321. [Google Scholar]
Liu, H.; Dai, Z.; So, D.; Le, Q.V. Pay attention to mlps. Adv. Neural Inf. Process. Syst. 2021, 34, 9204–9215. [Google Scholar]
Melas-Kyriazi, L. Do you even need attention? A stack of feed-forward layers does surprisingly well on imagenet. arXiv 2021, arXiv:2105.02723. [Google Scholar]
Peng, W.; Shi, J.; Varanka, T.; Zhao, G. Rethinking the ST-GCNs for 3D skeleton-based human action recognition. Neurocomputing 2021, 454, 45–53. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar]
Peng, W.; Varanka, T.; Mostafa, A.; Shi, H.; Zhao, G. Hyperbolic deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 10023–10044. [Google Scholar]
Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric deep learning: Going beyond euclidean data. IEEE Signal Process. Mag. 2017, 34, 18–42. [Google Scholar]
Liu, Q.; Nickel, M.; Kiela, D. Hyperbolic graph neural networks. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Chami, I.; Ying, Z.; Ré, C.; Leskovec, J. Hyperbolic graph convolutional neural networks. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Zhu, Y.; Zhou, D.; Xiao, J.; Jiang, X.; Chen, X.; Liu, Q. HyperText: Endowing FastText with Hyperbolic Geometry. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; Association for Computational Linguistics: Cedarville, OH, USA, 2020; pp. 1166–1171. [Google Scholar] [CrossRef]
López, F.; Heinzerling, B.; Strube, M. Fine-Grained Entity Typing in Hyperbolic Space. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Florence, Italy, 2 August 2019; Association for Computational Linguistics: Florence, Italy, 2019; pp. 169–180. [Google Scholar] [CrossRef]
Dhingra, B.; Shallue, C.; Norouzi, M.; Dai, A.; Dahl, G. Embedding Text in Hyperbolic Spaces. In Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12), New Orleans, LA, USA, 6 June 2018; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 59–69. [Google Scholar] [CrossRef]
Tifrea, A.; Becigneul, G.; Ganea, O.E. Poincaré GloVe: Hyperbolic Word Embeddings. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Nickel, M.; Kiela, D. Poincaré embeddings for learning hierarchical representations. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Peng, W.; Shi, J.; Xia, Z.; Zhao, G. Mix dimension in poincaré geometry for 3d skeleton-based action recognition. In Proceedings of the 28th ACM International Conference on Multimedia 2020, Seattle, WA, USA, 12–16 October 2020; pp. 1432–1440. [Google Scholar]
Bachmann, G.; Bécigneul, G.; Ganea, O. Constant curvature graph convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR 2020, Virtual, 13–18 July 2020; pp. 486–496. [Google Scholar]
Khrulkov, V.; Mirvakhabova, L.; Ustinova, E.; Oseledets, I.; Lempitsky, V. Hyperbolic image embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 14–19 June 2020; pp. 6418–6428. [Google Scholar]
Weber, M.; Zaheer, M.; Rawat, A.S.; Menon, A.K.; Kumar, S. Robust large-margin learning in hyperbolic space. Adv. Neural Inf. Process. Syst. 2020, 33, 17863–17873. [Google Scholar]
Mathieu, E.; Le Lan, C.; Maddison, C.J.; Tomioka, R.; Teh, Y.W. Continuous hierarchical representations with poincaré variational auto-encoders. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Skopek, O.; Ganea, O.E.; Becigneul, G. Mixed-curvature Variational Autoencoders. In Proceedings of the 8th International Conference on Learning Representations (ICLR) 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
Peng, W.; Hong, X.; Xu, Y.; Zhao, G. A boost in revealing subtle facial expressions: A consolidated eulerian framework. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–5. [Google Scholar]
Yang, J.; Jiang, Y.G.; Hauptmann, A.G.; Ngo, C.W. Evaluating bag-of-visual-words representations in scene classification. In Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval 2007, Bavaria, Germany, 24–29 September 2007; pp. 197–206. [Google Scholar]
Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulupp, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR 2019, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Mikołajczyk, A.; Grochowski, M. Data augmentation for improving deep learning in image classification problem. In Proceedings of the 2018 International Interdisciplinary PhD Workshop (IIPhDW), Swinoujscie, Poland, 9–12 May 2018; pp. 117–122. [Google Scholar]
Shaha, M.; Pawar, M. Transfer learning for image classification. In Proceedings of the 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 29–31 March 2018; pp. 656–660. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
Zoph, B.; Le, Q. Neural Architecture Search with Reinforcement Learning. In Proceedings of the International Conference on Learning Representations 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
Chen, W.; Han, X.; Lin, Y.; Zhao, H.; Liu, Z.; Li, P.; Sun, M.; Zhou, J. Fully Hyperbolic Neural Networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Dublin, Ireland, 2022; pp. 5672–5686. [Google Scholar] [CrossRef]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 20 September 2023).
Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-Ucsd Birds-200-2011 Dataset. 2011. Available online: https://paperswithcode.com/dataset/cub-200-2011 (accessed on 20 September 2023).
Atigh, M.G.; Schoep, J.; Acar, E.; Van Noord, N.; Mettes, P. Hyperbolic image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4453–4462. [Google Scholar]
Beltrami, E. Teoria Fondamentale Degli Spazii di Curvatura Costante Memoria. Ann. Mat. 1868, 2, 232–255. [Google Scholar] [CrossRef]
Cannon, J.W.; Floyd, W.J.; Kenyon, R.; Parry, W.R. Hyperbolic geometry. Flavors Geom. 1997, 31, 2. [Google Scholar]
van Wyk, G.J.; Bosman, A.S. Evolutionary neural architecture search for image restoration. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar]
Ganea, O.; Bécigneul, G.; Hofmann, T. Hyperbolic neural networks. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
Nickel, M.; Kiela, D. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In Proceedings of the International Conference on Machine Learning, PMLR 2018, Vienna, Austria, 10–15 July 2018; pp. 3779–3788. [Google Scholar]
Moretti, V. The interplay of the polar decomposition theorem and the Lorentz group. arXiv 2002, arXiv:math-ph/0211047. [Google Scholar]
Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, PMLR 2021, Virtual Event, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Cai, H.; Gan, C.; Han, S. Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition. arXiv 2022, arXiv:2205.14756. [Google Scholar]
Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
Becigneul, G.; Ganea, O.E. Riemannian Adaptive Optimization Methods. In Proceedings of the International Conference on Learning Representations 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. The equivalent statements for the fifth postulate. (a) If a straight line intersects with one of the two parallels, it will also intersect with the other. (b) There is one (and only one) line that passes through any given point and is parallel to the given line. (c) There is a triangle in which the sum of the three angles is 180

^{°}

. (d) Given any figure, there exists a figure with the same shape of any size.

Figure 1. The equivalent statements for the fifth postulate. (a) If a straight line intersects with one of the two parallels, it will also intersect with the other. (b) There is one (and only one) line that passes through any given point and is parallel to the given line. (c) There is a triangle in which the sum of the three angles is 180

^{°}

. (d) Given any figure, there exists a figure with the same shape of any size.

Figure 2. Illustration of the Klein model (left two) and Poincaré model (Right two) in hyperbolic space. Leftmost: The relationships between the Lorentz model and the Klein model. We provide examples of a ‘straight line’ in the Klein model (second from the left). Rightmost: The Poincaré model and the examples of the ‘straight line’ in it. Its relationship with the Lorentz model is provided on the second from the right.

Figure 3. Illustration of the relationship in the five hyperbolic models. Here, the five models are represented in a two-dimensional space. The points

h \in H

,

b \in B

,

j \in J

,

k \in K

, and

l \in L

can be thought of as the same points in hyperbolic space.

Figure 3. Illustration of the relationship in the five hyperbolic models. Here, the five models are represented in a two-dimensional space. The points

h \in H

,

b \in B

,

j \in J

,

k \in K

, and

l \in L

can be thought of as the same points in hyperbolic space.

Figure 4. Architecture of hyperbolic Res-MLP (HR-MLP). There are three main modules in this framework, which are the Lorentz linear embedding, the hyperbolic residual block (HR block), and the hyperbolic MLP head. Through the first module, the feature is projected into hyperbolic space. The feature is further updated by HR block when performing information aggregation using the cross-patch and cross-channel operations in it. Finally, the class prediction is made by a hyperbolic MLP head.

Figure 5. The Lorentz linear layer in the HR block. There are two modules in this layer, which are the linear module and the scaling module, respectively. The former module performs the linear transformation for the input (nonlinearity can also be applied) and then the following scaling module ensures the feature lies on the manifold.

Figure 6. The HR block in HR-MLP. The HR block consists of the hyperbolic operations that transform the input features and ensure that features lie in the manifold. The features are first transformed via the Lorentz cross-channel layer applied to patches. After transposing the tensor, it is passed through the Lorentz cross-patch layer applied to the channels. Finally, after adding the transformed tensor with the input via skip connection, the result passes through a scaling module, which ensures that the features lie back in the Lorentz manifold.

Figure 7. Visualization of the features. The t-SNE feature visualization on the CIFAR10 datasets. Here, we compare our method with its Euclidean counterpart. We keep most of the modules the same, except for our modules in hyperbolic space. We can find that the features from the proposed method provide much better classification clusters. (Different colors mean different categories).

Table 1. Test accuracy on CIFAR10, CIFAR100, and MiniImageNet using HR-MLP, ResMLP, ViT, and EfficientViT.

Methods	Parameters (M)	Flops (M)	Dataset
Methods	Parameters (M)	Flops (M)	CIFAR10	CIFAR100	MiniImageNet
CNN	2.112 M	10 M	61.86%	29.59%	20.15%
MLP mixer	1.005 M	13.3 M	59.01%	27.01%	30.30%
ResMLP-S12	0.685 M	81 M	62.77%	35.29%	36.83%
ViT	5.912 M	89.2 M	59.2%	32.23%	21.67%
EfficientViT-B0	2.142 M	10 M	75.04%	46.05%	34.43%
HR-MLP (Ours)	1.038 M	219 M	76.44%	48.55%	37.47%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahsan, A.O.; Tang, S.; Peng, W. Efficient Hyperbolic Perceptron for Image Classification. Electronics 2023, 12, 4027. https://doi.org/10.3390/electronics12194027

AMA Style

Ahsan AO, Tang S, Peng W. Efficient Hyperbolic Perceptron for Image Classification. Electronics. 2023; 12(19):4027. https://doi.org/10.3390/electronics12194027

Chicago/Turabian Style

Ahsan, Ahmad Omar, Susanna Tang, and Wei Peng. 2023. "Efficient Hyperbolic Perceptron for Image Classification" Electronics 12, no. 19: 4027. https://doi.org/10.3390/electronics12194027

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Hyperbolic Perceptron for Image Classification

Abstract

1. Introduction

2. Related Works

2.1. Image Classification in Euclidean Space

2.2. Hyperbolic Deep Learning

3. Preliminary

3.1. Topological Spaces and Manifold

3.2. Isometric Models in Hyperbolic Space

3.2.1. Lorentz Model

3.2.2. Klein Model

3.2.3. Poincaré Model

3.2.4. Poincar é Half Plane Model

3.2.5. Hemisphere Model

4. Methodology

4.1. Lorentz Linear Embedding

4.2. Lorentz Linear Layer

4.3. Lorentz Cross-Channel and Cross-Patch Layers

4.4. Hyperbolic MLP Head

5. Experiment

5.1. Datasets and Metrics

5.2. Implementation

6. Results

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI