Few Shot Class Incremental Learning via Grassmann Manifold and Information Entropy

Gu, Ziqi; Lu, Zihan; Han, Cao; Xu, Chunyan

doi:10.3390/electronics12214511

Open AccessArticle

Few Shot Class Incremental Learning via Grassmann Manifold and Information Entropy

PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(21), 4511; https://doi.org/10.3390/electronics12214511

Submission received: 27 September 2023 / Revised: 19 October 2023 / Accepted: 26 October 2023 / Published: 2 November 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Few-shot class incremental learning is a challenging problem in the field of machine learning. It necessitates models to gradually learn new knowledge from a few samples while retaining the knowledge of old classes. Nevertheless, the limited data available for new classes not only leads to significant overfitting problems but also exacerbates the issue of catastrophic forgetting in the incremental learning process. To address the above two issues, we propose a novel framework named Grassmann Manifold and Information Entropy for Few-Shot Class Incremental Learning(GMIE-FSCIL). Different from existing methods that model parameters on the Euclidean space, our method optimizes the incremental learning network on the Grassmann manifold. More specifically, we incorporate the acquired knowledge of each class on the Grassmann manifold, ensuring the preservation of their inherent geometric properties by Grassmann Metric Learning(GML) module. Acknowledging the interconnected relationships of knowledge, with information entropy we create a neighborhood graph on Grassmann manifold to maintain inter-class structural information by Graph Information Preserving(GIP) module, thus mitigating catastrophic forgetting of learned knowledge. In our evaluation of CIFAR100, miniImageNet, and CUB200 datasets, we achieved significant improvements in terms of Avg compared to mainstream methods, with at least 2.72%, 1.21%, and 1.27% increases.

Keywords:

few-shot class incremental learning; Grassmann manifold; Information entropy; graph embedding

1. Introduction

In the realm of computer vision, significant advancements have been made through the utilization of deep learning techniques [1,2,3]. These achievements can be attributed to the abundance of labeled data and remarkable computational capabilities. Nevertheless, in numerous real-world scenarios, it proves challenging to curate a comprehensive supervised dataset on a large scale. Furthermore, models trained by supervised learning methods are limited to making predictions within a finite set of predefined classes. Therefore, the significance of a neural network possessing the capability of continual learning becomes apparent, as it enables the acquisition of new classes by only a limited number of labeled samples. In contrast to machine learning systems, humans excel at grasping new knowledge with minimal examples and retaining previous knowledge without catastrophic forgetting. The objective of this research focuses on addressing the challenges associated with few-shot continual/incremental learning tasks [4,5]. Within the context of incremental learning, two primary issues arise: (i) Effectively acquiring new knowledge for new classes by only a few labeled samples, and (ii) Mitigating catastrophic forgetting to ensure the retention of previously learned knowledge.

In recent, there has been a surge of research on the topic of few-shot class incremental learning, with numerous studies exploring this area from diverse perspectives. Certain methodologies in the realm of few-shot class incremental learning, as suggested by various investigations [6,7,8], involve the utilization of graph models or neural gas networks to establish topological structures. This approach aids in the retention of knowledge and acts as a preventive measure against catastrophic forgetting. In the pursuit of effectively learning new classes while minimizing the forgetting of previously acquired ones, a viable solution known as the forward compatible training method [9] has emerged. This approach incorporates virtual prototypes to compress the embedding of learned classes and predicts potential new classes, thereby offering an effective strategy to address two problems. Similarly, some works have explored leveraging semantic information and knowledge distillation to address the challenges of few-shot class incremental learning, as demonstrated in [10]. Several meta-learning based methods [11,12] have emerged to enhance the capability of neural networks in acquiring novel classes while minimizing interference with previously learned classes during incremental learning. In order to enhance the feature extraction capability of the model, researchers have explored the use of self-supervised methods [5,13]. These approaches involve incorporating a self-supervised loss function into the training process to improve the feature extraction capability. To address these challenges of forgetting and overfitting, a combination of subspace and synthetic feature techniques was proposed in [14]. Furthermore, a self-promoted prototype learning method was introduced in [15] to explicitly learn feature representations for few-shot class incremental learning. This approach facilitates subsequent continual tasks by focusing on learning and updating prototypes specific to each class. Zhang et al. [16] introduced an effective method called efficient prototype replay and calibration (EPRC) to enhance classification performance and mitigate the issues of catastrophic forgetting for old classes and overfitting for novel classes.

In this work, we propose a novel framework named Grassmann Manifold and Information Entropy for Few-Shot Class Incremental Learning(GMIE-FSCIL). It allows deep neural networks to learn incrementally from a sequential stream of few-shot labeled data. Typically, incremental learning models are trained with a large amount of labeled data in the base session. However, during the incremental learning session, we may encounter only a few samples of unknown classes. To address this challenge, we take a different approach compared to most existing methods by optimizing the network parameters on the Grassmann manifold space rather than the Euclidean space. By optimizing the network parameters in the Grassmann manifold space, we can better leverage the geometric properties and linear subspace information, resulting in improved incremental learning performance.

This approach enables effective learning of new classes with limited labeled data while mitigating the risk of overfitting. Specifically, we propose a Grassmann Metric Learning(GML) module that measures the distance between each training sample and its corresponding class pattern onto the Grassmann manifold. This module leverages the linear subspace properties of the Grassmann manifold to better utilize its geometric characteristics. The GML helps preserve geometric properties, reduces the risk of overfitting, and improves adaptability to new classes during the incremental learning process. Additionally, recognizing that human knowledge is not isolated but instead demonstrates interconnected structural relationships [17], our aim is to maintain the inter-class structural information of previously learned classes during incremental learning. In order to mitigate the detrimental effects of catastrophic forgetting in the incremental sessions, we propose a Graph Information Preservation(GIP) module. This module initially constructs a neighborhood graph on the Grassmann manifold and subsequently preserves the inter-class structural relationships of prior knowledge by calculating the information entropy of the adjacent session graphs. We summarize the main contributions of this work as follows:

We introduce an innovative framework named Grassmann Manifold and Information Entropy for Few-Shot Class Incremental Learning(GMIE-FSCIL), which allows the network to continually learn new knowledge from a few shot labeled data.
Drawing inspiration from the Grassmann manifold and information entropy, we introduce two modules, namely Grassmann Metric Learning(GML) and Graph Information Preservation(GIP), to address the problems of the few-shot class incremental learning.
The experimental results have shown that our framework outperforms mainstream methods, which demonstrates its superiority in performance on datasets: CUB200 [18], CIFAR100 [19], and miniImageNet [20].

The remainder of this article is arranged as follows. Section 2 reviews related work. Section 3 describes our proposed Grassmann Metric Learning module and Graph Information Preservation module in detail. Section 4 conducts comparison experiments and ablation experiments on three datasets. Finally, Section 5 summarizes our work.

2. Related Work

2.1. Few Shot Class Incremental Learning

In recent years, there has been an increasing focus on the field of few-shot class incremental learning, which aims to enable models to learn incrementally from a series of few-shot labeled data. Multiple strategies have been proposed to tackle the associated challenges, including mitigating forgetting, reducing overfitting, and handling data imbalance. For instance, Tao et al. [6] proposed the utilization of a neural gas network (NG) to acquire and maintain a topological structure that captures different category features. This enables effective few-shot continual learning by preserving the relationships between categories. Another study by Zhang et al. [7] introduced an evolving classifier based on a graph attention network, which incorporates graph models to facilitate information propagation between classifiers. This approach supports knowledge transfer and adaptation to new classes in few-shot scenarios. Compared to the aforementioned methods, our method involves projecting the network onto the Grassmann manifold. By harnessing the properties of the Grassmann manifold, we can effectively retain the geometric properties and structural relationships.

To address the issue of catastrophic forgetting, Zhou et al. proposed the forward compatible training (FACT) method [9]. FACT generates virtual classes and incorporates new classes in a forward-compatible setting while retaining knowledge of previously learned classes. This strategy ensures that the model maintains its performance in prior classes while accommodating new ones. Researchers have also explored bi-level optimization approaches based on meta-learning [11]. These techniques optimize models to reduce forgetting of previously learned knowledge while adapting to new classes. In addition, self-supervised stochastic classifiers were introduced to address few-shot class incremental learning [13]. These classifiers leverage self-supervision mechanisms to enhance performance in this context. The semantic-aware knowledge distillation approach [10] considers word embeddings as additional information and incorporates knowledge distillation terms to mitigate forgetting during incremental learning. Furthermore, Zhu et al. [15] proposed an incremental prototype learning scheme that fine-tunes class prototypes using known prototypes and few-shot samples from new classes. This method effectively trains the model while updating the prototypes as needed. To balance the trade-off between accuracy and computational memory cost when learning novel classes, Michael et al. [21] developed the Constrained Few-shot class incremental Learning (C-FSCIL) method. C-FSCIL employs hyper-dimensional embedding to enable continual learning of additional classes beyond the fixed number of dimensions in the feature space. Zhang et al. proposed an efficient technique known as the efficient prototype replay and calibration (EPRC) [16] to boost classification performance and alleviate the problems of catastrophic forgetting for previously learned classes and overfitting for new classes. These approaches exemplify ongoing efforts to enhance few-shot class incremental learning by introducing innovative techniques that address various challenges and optimize model performance.

Hosein et al. [22] introduced the challenge of continual learning in an open set context and used the Openmax method. Initially designed for open set recognition in optical images, they employed it to enhance the performance of a network in incremental learning scenarios with SAR images. While the aforementioned methods can effectively alleviate the issues of incremental learning, they often fail to consider the preservation of inter-class relationships adequately. Our approach addresses this gap by constructing neighborhood graphs to maintain these inter-class relationships.

2.2. Graph Manifold Embedding

In recent research, several approaches have emerged to explore graph manifold embedding techniques. For instance, Yan et al. [23] proposed a supervised method called Marginal Fisher Analysis (MFA) to reduce data dimensionality. MFA constructs two graphs that capture compactness within classes and separability between classes. Wang et al. [24] introduced the geometry-aware graph embedding projection metric learning (GEPML) algorithm, which aims to learn a projection metric that considers the geometry of the data manifold. GEPML extends the Euclidean collaborative representation to the inter- and intra-class similarity graphs on the Grassmann manifold, capturing local structural information. The authors formulate the Grassmann dimensionality reduction problem by jointly learning the feature mapping and the similarity metric with a carefully designed regularization term. Mehrtash et al. [25] proposed a graph embedding-based discriminant analysis approach on the Grassmann manifold. By integrating inter-class and intra-class similarity graphs, this approach effectively harnesses the data’s geometric structure to capture both intra-class compactness and inter-class separability. SDMME learns the underlying intrinsic representation of raw data by constructing two sparse graphs based on structured dictionaries. It simultaneously learns multiple feature spaces to reduce bias from the same class subspace while maximizing distances to subspaces of other classes. Yang et al. [26] proposed a novel approach for identifying common hub nodes in individual brain networks. This method involves learning a orthogonality constrainted graph embedding, which resides on the Grassmann manifold, to represents the local topological features. They also developed an optimization scheme to find reliable hub nodes and population-based common hub node. These approaches reflect the ongoing research efforts in graph manifold embedding techniques, introducing innovative methods that address various aspects such as dimensional reduction, geometry awareness, discriminant analysis, sparsity, and identification of common structures across population.

2.3. Information Entropy

The concept of Mutual Information is commonly used to quantify the level of interdependence between two random variables [27]. Due to the computational challenges associated with precisely calculating mutual information between high-dimensional random variables, researchers have introduced the Mutual Information Neural Estimator (MINE) [28]. MINE utilizes dual representations of the Kullback-Leibler divergence (KL) [29] to estimate mutual information. Subsequently, the Jensen-Shannon divergence (JSD) [30] and noise-contrastive estimation (NCE) [31] have been employed along with a learned network to estimate mutual information [32]. This method maximizes mutual information to enhance deep representations through unsupervised learning. To expand the estimation of mutual information to computations involving three variables across multiple networks, the second-order Deep Multiplex Infomax (HD-MI) approach was introduced [33]. The Deep Mutual Information Maximin (Deep-MIM) technique [34] deals with cross-modal clustering tasks by conserving shared information across multiple modalities while eliminating redundant information within individual modalities. In the context of unsupervised classification tasks, Mutual Information Networks have demonstrated promising results [35]. In online continual learning scenarios, Guo et al. [36] utilized mutual information estimation to learn more robust features and preserve knowledge from previous tasks.

3. Materials and Methods

3.1. Problem Definition

Few-shot class incremental learning (FSCIL) is a learning paradigm that involves continuously training a model using a sequential stream of few shot labeled samples. This approach assumes that old task samples are no longer accessible during the learning of new few-shot tasks. Formally, we define the training set, label set, and test set as X, Y, and Z, respectively. The training process involves a sequence of labeled training datasets denoted as

X_{1}

,

X_{2}

,…,

X_{T}

. Here

X_{t}

represents the training set for the t-th session,

Y_{t}

represents the corresponding label set, and T represents the total number of incremental learning sessions. In each incremental session, the number of training samples is limited, and the task is described as “N-way K-shot”. This means that in each incremental session, the dataset is divided into N classes, with K samples in each class. The FSCIL base session involves initially training the model on a base task that includes a sufficient number of instances in the training set

X_{1}

and the corresponding label set

Y_{1}

. Each training set is constructed such that it contains no repeated class labels in the incremental learning, i.e.,

\forall i, j

and

i \neq j

,

Y_{i} \cup Y_{j} = ⌀

. For each

t > 1

, the training set

X_{t}

corresponds to a set of few-shot samples for new classes introduced in the t-th incremental learning session. It should be noted that the size of

X_{t}

is smaller than the size of

X_{1}

. For example, in the case of a 10-way 5-shot setting, each incremental process (i.e.,

t > 1

) would contain ten new classes each with only five training samples.

In few-shot class incremental learning, we encounter two key challenges: (i) Limited availability of training data for new classes poses difficulties in effectively learning knowledge about these classes. (ii) We need to mitigate catastrophic forgetting in order to ensure the retention of previously learned knowledge. In this work, our primary objective is to address these challenges by emphasizing two key aspects. Firstly, we can project the center of each class onto the Grassmann manifold as patterns and calculate the distances between training samples and their corresponding class patterns. This approach preserves the geometric properties of each class, reduces the risk of overfitting, and enhances adaptability to new classes during the incremental learning process, thereby facilitating better learning of new classes. Secondly, taking into consideration that human knowledge does not exist in isolation but rather within a knowledge structure [17], our objective is to leverage the Grassmann manifold metric to construct a neighborhood graph. Additionally, we aim to preserve the inter-class structural information of the learned classes through the use of information entropy. This approach helps mitigate the occurrence of catastrophic forgetting during the incremental learning.

3.2. Overview

The proposed Grassmann Manifold and Information Entropy for Few-Shot Class Incremental Learning (GMIE-FSCIL) framework is presented in Figure 1, and the Grassmann Metric Learning module is presented in Figure 2. In general, the base network

Θ_{1}

is initially trained on a large training dataset

X_{1}

by the classical cross-entropy loss function on Euclidean space. However, during the incremental learning process (

t > 1

), neural networks often encounter difficulties in effectively learning knowledge about new classes due to the limited samples. To mitigate this issue, we propose the Grassmann Metric Learning (GML) module, which addresses the problem by quantifying the distance between each training sample and its respective class pattern onto the Grassmann manifold. This method helps maintain the geometric properties of the samples, minimize the risk of overfitting, and enhance adaptability to new classes throughout the incremental learning process. As a result, it promotes the improved acquisition of knowledge related to new classes.

In incremental learning sessions, it is crucial to not only focus on learning new knowledge but also mitigate the issue of catastrophic forgetting. To address this issue, we designed the Graph Information Preserving (GIP) module to preserve the adjacency relationships among previously learned classes, thus preventing the network from forgetting the knowledge it has learned. Specifically, we embed network parameters

Θ

onto the Grassmann manifold by an orthogonalization method. This approach allows for a better description of the properties of the parameter’s linear subspace while preserving the learned knowledge, ultimately enhancing adaptability to new classes with few shot samples. At each learning stage t, we utilize the metrics of the Grassmann manifold to construct a neighborhood graph

G (A_{t}, V_{t})

that describes the adjacency relationships among the learned classes. Here,

A

represents the adjacency weights, and

V

represents class centers. The neighborhood graph

G

depicts the correlations between learned classes and dynamically update on the Grassmann manifold. Subsequently, we employ information entropy to maintain the structural relationships with the neighborhood graph

G

, further preserving the adjacency relationships among the previously learned classes, thus alleviating the problem of catastrophic forgetting. Therefore, this method proves viable throughout the incremental learning process, as it concurrently preserves both geometric characteristics and structural relationships.

3.3. Grassmann Metric Learning

In few-shot incremental learning, due to the limited samples for new classes, neural network often struggle to effectively learn knowledge about these new classes. We leverage the properties of the Grassmann manifold’s linear subspaces to enable the model to better acquire knowledge about the new classes during the incremental learning sessions. The Grassmann manifold, a special type of Riemannian manifold, is employed for the embedding of p-dimensional linear subspaces within a d-dimensional Euclidean space. Its mathematical formulation is defined as

G (d, p) = {X \in R^{d \times p} : X^{T} X = I}

, which X denotes any point on the Grassmann manifold.

We have designed a Grassmann Metric Learning module that projects the pattern of each class

Θ_{f c}^{c}

onto the Grassmann manifold. It metrics the distances between each training sample and the corresponding class pattern onto the Grassmann manifold, aiding the model in more effectively learning knowledge about new classes. Specifically, we employ Householder transformation [37] to orthogonalize the pattern

Θ_{f c}^{c}

of each class, enabling their projection onto the Grassmann manifold:

P_{c} = I - 2 \frac{Θ_{f c}^{c} {(Θ_{f c}^{c})}^{T}}{| | Θ_{f c}^{c} {| |}^{2}}, s . t . c = 1, 2, \dots, C

(1)

where,

P_{c}

represents the projection of class c on the Grassmann manifold, and

{P_{c}}_{c}^{C}

denotes the set of each class patterns on the Grassmann manifold, respectively.

Subsequently, we calculate the distance between a convolutional feature of sample

F_{c o n v} x_{i}^{c}

and its corresponding class pattern

P_{c}

, transforming it into the Euclidean-to-Grassmann metric [38], which can be expressed as:

L_{D} = \sum_{i = 1}^{| X_{t} |} | | F_{c o n v} x_{i}^{c} - P_{c} P_{c}^{T} F_{c o n v} x_{i}^{c} {| |}_{F} .

(2)

By projecting the pattern of each class onto the Grassmann manifold and measuring the distance between each training sample and the pattern of the corresponding class onto the Grassmann manifold, we enhance the model’s learning ability, enabling it to better learn knowledge about new classes. Therefore, we refer to it as the Grassmann Metric Learning module:

L_{G M L} = L_{C E} + β L_{D}

(3)

The learning objective comprises the traditional cross-entropy loss

L_{C E}

and

L_{D}

, with

β

serving as a hyperparameter.

3.4. Graph Information Preserving

In few-shot class incremental learning, another key issue is the catastrophic forgetting of previously learned knowledge. It’s noteworthy that in the human brain, learned knowledge is not isolated; there exists a structure among the learned knowledge to maintain their relationships [17]. Therefore, we propose a Graph Information Preserving module to embed the incremental model parameters onto the Grassmann manifold, better compensating for the drift in incremental model parameters. Additionally, this module maintains the structural relationships among previously learned classes by information entropy, thereby mitigating the catastrophic forgetting of previously learned knowledge.

To begin with, we aim to embed the parameters of each layer, denoted as

W_{l}

, from the incremental model

Θ

onto the Grassmann manifold. This necessitates the orthogonalization of each layer’s parameters, where

W_{l}

belongs to the

Θ_{l}

, and l denotes the layer number (i.e.

l \in 1, 2, 3, \dots

). Consequently, we introduce an orthogonalization function

H (\cdot)

to perform orthogonalization on the parameters

W_{l}

and subsequently embed them onto the Grassmann manifold. Recall that for an orthogonalized matrix

W_{l}

, it should satisfy the condition:

W_{l}^{T} W_{l} - I = 0

. If we consider an arbitrary vector

z \in R^{d}

, the orthogonalization function

H (W_{l}^{T} W_{l} - I, z)

can then be defined as:

H (W_{l}^{T} W_{l} - I, z) = | \frac{| | (W_{l}^{T} W_{l} - I) z {| |}^{2}}{{| | z | |}^{2}} - 1 | .

(4)

Here,

H (W_{l}^{T} W_{l} - I, z) \leq δ_{W_{l}^{T} W_{l} - I}

, and

δ_{W_{l}^{T} W_{l} - I} \in (0, 1)

is a small value [39]. To find the minimum of the function

F (U, z)

, we can first define the spectral norm of the matrix

W_{l}^{T} W_{l} - I

as:

α (W_{l}^{T} W_{l} - I) = {sup}_{z \in R^{d}, z \neq 0} \frac{| | (W_{l}^{T} W_{l} - I) z | |}{| | z | |}

. Subsequently, based on Equation (4), the problem of parameter orthogonalization can be reformulated as the minimization of the spectral norm of the matrix

α (W_{l}^{T} W_{l} - I)

:

L_{e m b} = \sum_{i = 1}^{L} H (W_{l}^{T} W_{l} - I, z) .

(5)

Embedding the parameters of the incremental model onto the Grassmann manifold aids in retaining the learned intra-class knowledge while enhancing its capacity to adapt to new tasks characterized by few shot samples.

In the next step, we establish structural relationships by constructing a neighborhood graph

G

on the Grassmann manifold. In the t-th incremental session, a neighborhood graph

G (A_{t}, V_{t})

is constructed to denote the structural relationships among the learned knowledge on the Grassmann manifold, with

A_{t}

and

V_{t}

denoting the corresponding adjacency weight matrix and node set. We utilize the learned Grassmann embedding of the fully-connected layer parameters for the corresponding classes in the t-th session to compose the graph node set

C_{t}

, expressed as

V t = x_{1}^{t}, x_{2}^{t}, \dots, x_{C_{t}}^{t}

. where,

x_{k}^{t}

denotes the learned Grassmann embedding for the k-th class, and

C_{t}

represents the total number of classes learned to the t-th session. Subsequently, to effectively extract the inter-class structural relationships, leveraging the previously learned Grassmann embedding, we can calculate the adjacency weight

a_{i j}

, which represents the correlation between the two learn class embedding

x_{i}

,

x_{j}

and

M = x_{j}^{T} x_{i}

:

\begin{matrix} a_{i, j} & = | | {(x_{i} - x_{j} M)}^{T} (x_{i} - x_{j} M) {| |}_{F}, \\ = | | I - x_{i}^{T} x_{j} M - M^{T} x_{j}^{T} x_{i} + M^{T} M {| |}_{F}, \\ = | | I - M^{T} M {| |}_{F} . \end{matrix}

(6)

We can utilize the constructed neighborhood graph to maintain the structural relationships among the learned classes in previous sessions. This is achieved by imposing constraints on the Grassmann embedding of learned classes. To prevent the forgetting of knowledge from previously learned classes, we calculate information entropy between the t-th and

(t + 1)

-th sessions (i.e.,

t > 1

) using the constructed neighborhood graph, thus preserving the structural relationships of the learned classes During each learning session t, it is possible to efficiently retain the relevant node set

V_{t} \in R^{C_{t - 1} \times C_{t - 1}}

. This retained data can later be utilized in upcoming sessions to uphold the relative configurations of the Grassmann embedding of the classes learned during the t-th stage. To provide a more specific example, in the t-th incremental session, where

Θ_{t}

represents the matrix comprising the Grassmann embedding of all the learned classes, we can also calculate

A_{t - 1}

based on the preserved

V_{t - 1}

. Next, we introduce the application of the max mutual information method based on information entropy, which effectively maintains the structural relationships among different class embedding, expressed as:

I_{S} = H (G_{t}^{t - 1}) - H (G_{t}^{t - 1} | G_{t - 1})

(7)

where, the first

C_{t - 1}

rows which corresponds to the classes learned in the

(t - 1)

-th session of

G_{t}

as

G_{t}^{t - 1}

,

H (G_{t}^{t - 1})

represents the Shannon entropy information of

G_{t}^{t - 1}

, while the conditional entropy

H (G_{t}^{t - 1} | G_{t - 1})

denotes the entropy of

G_{t}^{t - 1}

under the condition of

G_{t - 1}

. To solve Equation (7), we can transform it into a maximum mutual information, which can be represented as follows:

\begin{matrix} L_{G I P} = & I_{Θ_{f c}} (G_{t}^{t - 1}; G_{t - 1}) \\ = & E_{J} (F ((G_{t}^{t - 1}; G_{t - 1})) - log E_{M} (e^{F (G_{t}^{t - 1}; G_{t - 1})}) \end{matrix}

(8)

where,

F (\cdot)

is a discriminant function.

E_{J}

represents the expectation of the joint distribution of

G_{t}^{t - 1}

and

G_{t - 1}

, while

E_{M}

denotes the expectation of the margin distributions of

G_{t}^{t - 1}

and

G_{t - 1}

. In the Graph Information Preserving module, we optimized the full connection parameters

Θ_{f c}

through Grassmann embedding using an entropy-based approach to alleviate the catastrophic forgetting of learned knowledge.

4. Results

4.1. Dataset

We evaluated our method on three publicly available datasets: CIFAR100 [19], miniImageNet [20], and CaltechUCSD Birds-200-2011 (CUB200) [18]. CIFAR100 [19] comprises 100 classes, with a total of 60,000 RGB images sized at

32 \times 32

. Each class includes 500 training images and 100 test images. miniImageNet [20], a subset of ImageNet, contains 100 classes and a dataset of 60,000 images, each with dimensions of

84 \times 84

pixels. For both CIFAR100 [19] and miniImageNet [20], we divided the 100 classes into 60 base classes and 40 new classes. We utilized all training data from the 60 base classes to train a base network. The 40 new classes were used for eight 5-way 5-shot incremental learning tasks. CUB200 [18] is a fine-grained classification dataset with 11,788 images distributed across 200 classes, each sized at

224 \times 224

. We split the 200 classes of CUB200 [18] into 100 base classes and 100 new classes. In the base session, we used all training samples from the 100 base classes. We then conducted ten 10-way 5-shot incremental tasks involving the 100 new classes. Our split settings align with those of FSCIL [6] to ensure a fair comparison with other mainstream methods on the three datasets. 100 new classes. We followed the same split settings as FSCIL [6] for a fair comparison with other methods across the three datasets.

4.2. Implementation Details

We adhere to the approach outlined in FSCIL [6] and employ ResNet20 as the foundational model for CIFAR100 [19]. For miniImageNet [20] and CUB200 [18], ResNet18 serves as the backbone. In order to maintain the neural network’s ability to extract features during incremental learning, we freeze the parameters of the convolutional layers in the incremental learning sessions. We employ SGD as the optimizer, initialize the learning rate to 0.1, and set the batch size to 128 during the base session. We employ various evaluation metrics to assess our proposed method comprehensively. “

{Acc}_{t}

” signifies the Top-1 accuracy in the t-th incremental session, while “Avg” denotes the average accuracy across sessions, calculated as

\sum_{t = 1}^{T} {Acc}_{t} / T

. “ΔFinal” represents the difference in

{Acc}_{t}

between our method and the compared method in the final session. “KR” corresponds to the knowledge retention rate, defined as

{Acc}_{T} / {Acc}_{1}

. All experiments are conducted using the PyTorch framework on NVIDIA GeForce 4090 GPU.

4.3. Comparison with State-of-the-Art Methods

We initially assess the effectiveness of our GMIE-FSCIL approach on the miniImageNet dataset [20], as well as on the CIFAR100 dataset [19] and CUB200 dataset [18]. We conduct a comprehensive comparison with other mainstream methods. Detailed results can be found in Table 1, Table 2 and Table 3. Our GMIE-FSCIL method has outperformed all other mainstream methods by a substantial margin. Notably, our approach has achieved the highest score when compared to other mainstream methods. Moreover, in terms of overall performance during the final incremental session, GMIE-FSCIL outperforms the second-best method EPRC [16] by a substantial margin of 1.79% on the CUB200 dataset [18]. This demonstrates the consistent and strong learning capability of GMIE-FSCIL throughout the incremental learning process. We are confident that the Grassmann Metric Learning module effectively preserves learned intra-class knowledge and enhances adaptability to new tasks. Furthermore, our method consistently outperforms all the compared approaches, achieving a minimum margin of 1.27% improvement in the Avg metric on the CUB200 dataset [18].

This underscores that GMIE-FSCIL consistently demonstrates superior performance across all incremental learning sessions. In summary, the remarkable outcomes attained by GMIE-FSCIL in terms of both the Avg and the ΔFinal firmly underscore the efficacy of our proposed GMIE-FSCIL in preserving learned intra-class knowledge and enhancing adaptability to new tasks. Our experiments reveal that GMIE-FSCIL consistently demonstrates a minimum of 1.97% enhancement in the KR metric on the CUB200 dataset [18] when compared to all the other mainstream methods we evaluated. The KR metric quantifies the knowledge ratio between the final incremental session and the base session, with a higher KR value signifying less knowledge loss during incremental learning. Furthermore, this finding underscores that the integration of our proposed Grassmann Metric Learning and Graph Information Preserving modules has the potential to enhance few-shot classification performance, indicating the synergy between these two proposed modules. Additionally, we have computed the recall accuracy on all datasets as shown in Table 4, indicating a significant improvement of GMIE-FSCIL compared to Baseline and CEC. In conclusion, the outstanding performance of GMIE-FSCIL in our experiments unquestionably validates its excellence in learning new class knowledge and mitigating catastrophic forgetting.

4.4. Effectiveness of the Proposed Modules

We conducted a comprehensive ablation study of our proposed approach. This study encompasses an evaluation of two modules we introduced: Grassmann Matrix Learning(GML) and Graph Information Preserving(GIP). The results of these experiments are presented in Table 5, Table 6 and Table 7. During incremental learning sessions (t ≥ 1), the inclusion of GML serves to preserve the geometric characteristics of the samples, reduce the risk of overfitting, and improve adaptability to new classes over the course of incremental learning. Our module has demonstrated a notable enhancement of 2.52% and 1.61% concerning the KR and Avg metrics when compared to the baseline on the CUB200 dataset [18]. This affirms the efficacy of our Grassmann Metric Learning module.

We employ the GIP to preserve network parameters with orthogonality constraints, embedding them onto the Grassmann manifold. This module preserves their learned intra-class knowledge and enhances adaptability to new tasks while maximizing the preservation of the structural relationship by information entropy between the current neighborhood graph

G_{t}

and the previous neighborhood graph

G_{t - 1}

on the Grassmann manifold in incremental learning. The GIP substantial performance enhancements of 2.02% and 2.60% in terms of Avg and KR metrics on the CUB200 dataset [18], respectively. The outcomes demonstrate the effectiveness of our GIP approach in preserving the learned knowledge structure by constructing the neighborhood graph

G

on the Grassmann manifold, thus mitigating catastrophic forgetting. In conclusion, we are confident that the structural relationships between classes are effectively maintained throughout the incremental learning process, thereby validating the efficacy of the proposed GIP. Moreover, these quantitative results provide strong confirmation of the effectiveness of our proposed framework.

4.5. Ablation Study

Ablation studies have been conducted to assess the efficacy of our proposed modules as well as some experimental settings.

4.5.1. Confusion Matrix of GMIE-FSCIL and Baseline

In Figure 3, we present the confusion matrices for both the baseline and our proposed GMIE-FSCIL on three datasets. As evident, the diagonal elements of our GMIE-FSCIL exhibit a darker shade compared to those of the baseline. This indicates that GMIE-FSCIL is more proficient at learning new classes. Furthermore, it suggests that the prediction distribution of GMIE-FSCIL for new classes is more focused compared to that of the baseline. These results offer a visual demonstration of the superiority of GMIE-FSCIL in preserving previously learned knowledge and corroborate the effectiveness of our proposed framework.

4.5.2. Visualization Results of GMIE-FSCIL and Baseline

In Figure 4, we present the visualization results of both the baseline and our proposed GMIE-FSCIL. It is evident that GMIE-FSCIL exhibits a better observation of the target features in the visualization result. This suggests that GMIE-FSCIL pays more attention to the target features of the test samples while paying less attention to the background features compared to the baseline. These results intuitively demonstrate the superior ability of GMIE-FSCIL to learn knowledge effectively and validate the effectiveness of our proposed framework.

4.5.3. The Shot Number in the Incremental Learning

In Figure 5, we present accuracy curves for various numbers of samples across three datasets. The experimental outcomes clearly demonstrate that incorporating more samples can lead to significant enhancements in the performance of FSCIL. This is attributed to the fact that a larger dataset provides a more robust class attribute distribution and more accurate class relationships, which facilitate the learning of new knowledge. Furthermore, it’s worth noting that once the number of samples surpasses 15, the performance improvement becomes marginal. This observation implies that achieving satisfactory performance requires only an appropriate amount of newly labeled samples. This validates the effectiveness of our GMIE-FSCIL in addressing the few-shot class incremental learning task.

4.5.4. Different Methods with Similar Starting Performance

Table 8 displays a comparison between GMIE-FSCIL and five other traditional methods on the CIFAR dataset [19], where the performance of the base classes is consistently fixed at approximately 74.5%. The results indicate that when the base model performance is fixed, GMIE-FSCIL outperforms the other methods in three evaluation metrics. This suggests that GMIE-FSCIL excels in preserving previously learned knowledge, and the performance boost can be attributed to the introduced knowledge preservation rather than solely relying on the base model performance. These findings unequivocally establish the effectiveness of GMIE-FSCIL in efficiently new knowledge while mitigating catastrophic forgetting.

4.5.5. Accuracy of the New Classes for Each Incremental Learning Session

To assess the effectiveness of GMIE-FSCIL in learning knowledge about new classes, we display the new class accuracy of both the baseline and GMIE-FSCIL framework at each session of incremental learning in Figure 6. The experimental results unequivocally show that, in comparison to the baseline, our GMIE-FSCIL consistently achieves substantial improvements in new class learning accuracy at every session during the incremental learning process. This indicates that within GMIE-FSCIL, the Grassmann Metric Learning module effectively maintains the geometric attributes of the samples, reduces the risk of overfitting, and enhances adaptability to new classes. Meanwhile, the Graph Information Preserving module adeptly preserves knowledge pertaining to new classes. These findings strongly affirm the superior performance of GMIE-FSCIL in efficiently learning new knowledge and mitigating catastrophic forgetting.

5. Conclusions

In this paper, we present an innovative Grassmann Manifold and Information Entropy for Few-Shot Class Incremental Learning framework. This framework addresses the challenge of few-shot class incremental learning by preserving class geometric properties to enhance adaptability to new classes and by maintaining structural relationships on the Grassmann manifold using information entropy. This proposed framework enables deep models to learn new classes incrementally from limited labeled data while mitigating catastrophic forgetting of learned knowledge. We introduce the Grassmann Metric Learning module, which capitalizes on the linear subspace attributes of the Grassmann manifold to effectively exploit its geometric properties. The Grassmann Metric Learning module aids in mitigating overfitting risks and enhancing adaptability to new classes throughout the incremental learning process. Additionally, we also introduce the Graph Information Preserving module, which establishes a neighborhood graph on the Grassmann manifold. This module aims to uphold the evolving knowledge structure through information entropy. Qualitative and quantitative experimental results on three datasets demonstrate that GMIE-FSCIL excels in learning new knowledge while mitigating catastrophic forgetting during the incremental learning phase.

Author Contributions

Conceptualization, Z.G., and C.X.; methodology, Z.G.; software, C.H.; validation, Z.G., C.H., and C.X.; formal analysis, C.X.; investigation, Z.G. and C.H.; resources, Z.G.; data curation, C.H.; writing—original draft preparation, Z.G.; writing—review and editing, C.X., Z.L. and C.H.; visualization, Z.G., Z.L.; supervision, C.X., and C.H.; project administration, C.X.; and funding acquisition, C.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grants Nos. 61972204, 62072244), the fundamental research funds for the central universities under Grant 30919011232, the Natural Science Foundation of Shandong Province (Grant Nos. ZR2020LZH008, ZR2022LZH003), and in part by State Key Laboratory of High-end Server & Storage Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

CIFAR100: https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 14 September 2023); miniImageNet: https://github.com/yaoyao-liu/mini-imagenet-tools (accessed on 14 September 2023); CUB200: https://opendatalab.com/OpenDataLab/CUB-200-2011 (accessed on 14 September 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Rebuffi, S.A.; Kolesnikov, A.; Sperl, G.; Lampert, C.H. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2001–2010. [Google Scholar]
Mazumder, P.; Singh, P.; Rai, P. Few-shot lifelong learning. Proc. AAAI Conf. Artif. Intell. 2021, 35, 2337–2345. [Google Scholar] [CrossRef]
Tao, X.; Hong, X.; Chang, X.; Dong, S.; Wei, X.; Gong, Y. Few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12183–12192. [Google Scholar]
Zhang, C.; Song, N.; Lin, G.; Zheng, Y.; Pan, P.; Xu, Y. Few-shot incremental learning with continually evolved classifiers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; p. 12. [Google Scholar]
Dong, S.; Hong, X.; Tao, X.; Chang, X.; Wei, X.; Gong, Y. Few-shot class-incremental learning via relation knowledge distillation. Proc. AAAI Conf. Artif. Intell. 2021, 35, 1255–1263. [Google Scholar] [CrossRef]
Zhou, D.W.; Wang, F.Y.; Ye, H.J.; Ma, L.; Pu, S.; Zhan, D.C. Forward compatible few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9046–9056. [Google Scholar]
Cheraghian, A.; Rahman, S.; Fang, P.; Roy, S.K.; Petersson, L.; Harandi, M. Semantic-aware knowledge distillation for few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2534–2543. [Google Scholar]
Chi, Z.; Gu, L.; Liu, H.; Wang, Y.; Yu, Y.; Tang, J. Metafscil: A meta-learning approach for few-shot class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14166–14175. [Google Scholar]
Javed, K.; White, M. Meta-learning representations for continual learning. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Kalla, J.; Biswas, S. S3C: Self-supervised stochastic classifiers for few-shot class-incremental learning. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 432–448. [Google Scholar]
Cheraghian, A.; Rahman, S.; Ramasinghe, S.; Fang, P.; Simon, C.; Petersson, L.; Harandi, M. Synthesized feature based few-shot class-incremental learning on a mixture of subspaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 20–25 June 2021; pp. 8661–8670. [Google Scholar]
Zhu, K.; Cao, Y.; Zhai, W.; Cheng, J.; Zha, Z.J. Self-promoted prototype refinement for few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6801–6810. [Google Scholar]
Zhang, W.; Gu, X. Few Shot Class Incremental Learning via Efficient Prototype Replay and Calibration. Entropy 2023, 25, 776. [Google Scholar] [CrossRef] [PubMed]
Eichenbaum, H. How does the brain organize memories? Science 1997, 277, 330–332. [Google Scholar] [CrossRef] [PubMed]
Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-Ucsd Birds-200-2011 Dataset; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Herschel, M.; Karunaratne, G.; Cherubini, G.; Benini, L.; Sebastian, A.; Rahimi, A. Constrained few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9057–9067. [Google Scholar]
Oveis, A.H.; Giusti, E.; Ghio, S.; Meucci, G.; Martorella, M. Incremental Learning in Synthetic Aperture Radar Images Using Openmax Algorithm. In Proceedings of the 2023 IEEE Radar Conference (RadarConf23), San Antonio, TX, USA, 1–5 May 2023; pp. 1–6. [Google Scholar]
Yan, S.; Xu, D.; Zhang, B.; Zhang, H.J.; Yang, Q.; Lin, S. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 29, 40–51. [Google Scholar] [CrossRef]
Wang, R.; Wu, X.J.; Liu, Z.; Kittler, J. Geometry-aware graph embedding projection metric learning for image set classification. IEEE Trans. Cogn. Dev. Syst. 2021, 14, 957–970. [Google Scholar] [CrossRef]
Harandi, M.T.; Sanderson, C.; Shirazi, S.; Lovell, B.C. Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 2705–2712. [Google Scholar]
Yang, D.; Chen, J.; Yan, C.; Kim, M.; Laurienti, P.J.; Styner, M.; Wu, G. Group-wise hub identification by learning common graph embeddings on Grassmannian manifold. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8249–8260. [Google Scholar] [CrossRef] [PubMed]
Kinney, J.B.; Atwal, G.S. Equitability, mutual information, and the maximal information coefficient. Proc. Natl. Acad. Sci. USA 2014, 111, 3354–3359. [Google Scholar] [CrossRef] [PubMed]
Belghazi, M.I.; Baratin, A.; Rajeshwar, S.; Ozair, S.; Bengio, Y.; Courville, A.; Hjelm, D. Mutual information neural estimation. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 531–540. [Google Scholar]
Donsker, M.D.; Varadhan, S.R.S. On a variational formula for the principal eigenvalue for operators with maximum principle. Proc. Natl. Acad. Sci. USA 1975, 72, 780–783. [Google Scholar] [CrossRef] [PubMed]
Nowozin, S.; Cseke, B.; Tomioka, R. f-gan: Training generative neural samplers using variational divergence minimization. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Vollume 29. [Google Scholar]
Gutmann, M.; Hyvärinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and 568 Conference Proceedings, Sardinia, Italy, 13–15 May 2010; pp. 297–304. [Google Scholar]
Hjelm, R.D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; Bengio, Y. Learning deep representations by mutual information estimation and maximization. arXiv 2018, arXiv:1808.06670. [Google Scholar]
Jing, B.; Park, C.; Tong, H. Hdmi: High-order deep multiplex infomax. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2414–2424. [Google Scholar]
Mao, Y.; Yan, X.; Guo, Q.; Ye, Y. Deep mutual information maximin for cross-modal clustering. Proc. AAAI Conf. Artif. Intell. 2021, 35, 8893–8901. [Google Scholar] [CrossRef]
Bachman, P.; Hjelm, R.D.; Buchwalter, W. Learning representations by maximizing mutual information across views. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Guo, Y.; Liu, B.; Zhao, D. Online continual learning through mutual information maximization. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 8109–8126. [Google Scholar]
Huang, H.; Zhou, X.; He, R. Orthogonal transformer: An efficient vision transformer backbone with token orthogonalization. Adv. Neural Inf. Process. Syst. 2022, 35, 14596–14607. [Google Scholar]
Huang, Z.; Wang, R.; Shan, S.; Chen, X. Learning euclidean-to-riemannian metric for point-to-set classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1677–1684. [Google Scholar]
Bansal, N.; Chen, X.; Wang, Z. Can we gain more from orthogonality regularizations in training deep networks? In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
Castro, F.M.; Marín-Jiménez, M.J.; Guil, N.; Schmid, C.; Alahari, K. End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 233–248. [Google Scholar]
Hou, S.; Pan, X.; Loy, C.C.; Wang, Z.; Lin, D. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 831–839. [Google Scholar]
Liu, B.; Cao, Y.; Lin, Y.; Li, Q.; Zhang, Z.; Long, M.; Hu, H. Negative margin matters: Understanding margin in few-shot classification. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 438–455. [Google Scholar]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
Zhang, C.; Cai, Y.; Lin, G.; Shen, C. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12203–12213. [Google Scholar]
Zhuang, H.; Weng, Z.; He, R.; Lin, Z.; Zeng, Z. GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7746–7755. [Google Scholar]

Figure 1. Grassmann Manifold and Information Entropy for Few-Shot Class Incremental Learning pipeline involve multiple sessions denoted by t (

t \geq 1

), with network parameters represented as

Θ_{t}

for different incremental sessions (

Θ_{1}, Θ_{2}, Θ_{3}, \dots

). First, we incorporate the Grassmann Metric Learning module to enhance the model’s learning capacity by measuring the distance between sample features and class patterns. Second, the GIP module constructs a neighborhood graph

G (A_{t}, V_{t})

on the Grassmann manifold, using this graph to preserve the knowledge structure of previously learned classes through information entropy.

Figure 1. Grassmann Manifold and Information Entropy for Few-Shot Class Incremental Learning pipeline involve multiple sessions denoted by t (

t \geq 1

), with network parameters represented as

Θ_{t}

for different incremental sessions (

Θ_{1}, Θ_{2}, Θ_{3}, \dots

). First, we incorporate the Grassmann Metric Learning module to enhance the model’s learning capacity by measuring the distance between sample features and class patterns. Second, the GIP module constructs a neighborhood graph

G (A_{t}, V_{t})

on the Grassmann manifold, using this graph to preserve the knowledge structure of previously learned classes through information entropy.

Figure 2. Graph Information Persevering module.

Figure 3. The confusion matrix illustrates the final classification outcomes on the three datasets. The division between the base class and the new class is delineated by red lines. This approach improves the network’s predictive capacity, leading to a less dispersed confusion matrix.

Figure 4. The visualization results of GMIE-FSCIL and baseline are presented, clearly demonstrating that GMIE-FSCIL exhibits a superior ability to observe target features.

Figure 5. Performance comparison with the different K-shot numbers on three datasets.

Figure 6. Performance of new classes in each incremental learning session on three datasets.

Table 1. Performance comparison between GMIE-FSCIL and other mainstream methods on the miniImageNet dataset [20] and the results marked with * were obtained from the authors’ published code.

Methods	Accuracy in Each Session (%) ↑									KR ↑	ΔFinal ↑	Avg ↑
Methods	1	2	3	4	5	6	7	8	9	KR ↑	ΔFinal ↑	Avg ↑
Ft-CNN	61.31	27.22	16.37	6.08	2.54	1.56	1.93	2.60	1.40	2.28	+52.13	13.44
iCaRL [4]	61.31	46.32	42.94	37.63	30.49	24.00	20.89	18.80	17.21	28.07	+36.32	33.29
EEIL [40]	61.31	46.58	44.00	37.29	33.14	27.12	24.10	21.57	19.58	31.93	+33.95	34.97
TOPIC [6]	61.31	50.09	45.17	41.16	37.48	35.52	32.19	29.46	24.42	39.83	+29.11	39.65
NCM [41]	61.31	47.80	39.31	31.91	25.68	21.35	18.67	17.24	14.17	23.11	+39.36	30.83
SPPR [15]	61.45	63.80	59.53	55.53	52.50	49.60	46.69	43.79	41.92	68.21	+11.61	54.52
D-NegCosine [42]	71.68	66.64	62.57	58.82	55.91	52.88	49.41	47.50	45.81	63.90	+7.72	56.80
D-Cosine [43]	70.37	65.45	61.41	58.00	54.81	51.89	49.10	47.27	45.63	64.84	+7.90	55.99
D-DeepEMD [44]	69.77	64.59	60.21	56.63	53.16	50.13	47.79	45.42	43.41	62.21	+10.12	54.57
CEC [7]	72.00	66.83	62.97	59.43	56.70	53.73	51.19	49.24	47.63	66.15	+5.90	57.75
MetaFSCIL [11]	72.04	67.94	63.77	60.29	57.58	55.16	52.90	50.79	49.19	68.28	+4.34	58.85
C-FSCIL [21]	76.40	71.14	66.46	63.29	60.42	57.46	54.78	53.11	51.41	67.29	+2.12	61.61
FACT * [9]	75.68	70.65	66.53	62.75	59.39	56.19	53.26	51.10	49.48	65.38	+4.05	60.56
GKEAL [45]	73.59	68.90	65.33	62.29	59.39	56.70	54.20	52.59	51.13	69.48	+2.40	60.46
EPRC [16]	76.87	71.58	67.31	64.15	60.80	57.52	54.69	52.96	51.50	66.99	+2.03	61.62
GMIE-FSCIL	76.56	72.14	67.90	64.32	61.55	58.83	56.18	54.52	53.53	69.92	-	62.83

Table 2. Performance comparison between GMIE-FSCIL and other mainstream methods on the CIFAR100 dataset [19] and the results marked with * were obtained from the authors’ published code.

Methods	Accuracy in Each Session (%) ↑									KR ↑	ΔFinal ↑	Avg ↑
Methods	1	2	3	4	5	6	7	8	9	KR ↑	ΔFinal ↑	Avg ↑
Ft-CNN	64.10	36.91	15.37	9.80	6.67	3.80	3.70	3.14	2.65	4.13	+51.43	16.24
iCaRL [4]	64.10	53.28	41.69	34.13	27.93	25.06	20.41	15.48	13.73	21.41	+40.35	32.87
NCM [41]	64.10	53.05	43.96	36.97	31.61	26.73	21.23	16.78	13.54	21.12	+40.54	34.22
EEIL [40]	64.10	53.11	43.71	35.15	28.96	24.98	21.01	17.26	15.85	24.72	+38.23	33.79
TOPIC [6]	64.10	55.88	47.07	45.16	40.11	36.38	33.96	31.55	29.37	45.81	+24.71	42.62
SPPR [15]	64.10	65.86	61.36	57.45	53.69	50.75	48.58	45.66	43.25	67.47	+2.43	54.52
D-DeepEMD [44]	69.75	65.06	61.20	57.21	53.88	51.40	48.80	46.84	44.41	63.67	+9.67	55.39
D-NegCosine [42]	74.36	68.23	62.84	59.24	55.32	52.88	50.86	48.98	46.66	62.73	+7.42	57.71
D-Cosine [43]	74.55	67.43	63.63	59.55	56.11	53.80	51.68	49.67	47.68	63.95	+6.40	58.23
CEC [7]	73.07	68.88	65.26	61.19	58.09	55.57	53.22	51.34	49.14	67.25	+4.94	59.53
MetaFSCIL [11]	74.50	70.10	66.84	62.77	59.48	56.52	54.36	52.56	49.97	67.07	+4.11	60.79
C-FSCIL [21]	77.47	72.40	67.47	63.25	59.84	56.95	54.42	52.47	50.47	65.14	+3.52	61.64
FACT * [9]	78.44	72.33	68.23	63.90	60.58	58.20	55.96	53.59	51.32	65.43	+2.76	62.51
GKEAL [45]	74.01	70.45	67.01	63.08	60.10	57.30	55.50	53.39	51.40	69.45	+2.68	61.36
EPRC [16]	77.02	72.25	67.70	63.29	59.50	56.67	54.51	52.62	50.98	66.21	+3.10	61.61
GMIE-FSCIL	77.37	74.20	70.04	66.13	62.84	60.08	58.31	56.17	54.08	69.90	-	64.36

Table 3. Performance comparison between GMIE-FSCIL and other mainstream methods on the CUB200 dataset [18] and the results marked with * were obtained from the authors’ published code.

Methods	Accuracy in Each Session (%) ↑											KR ↑	ΔFinal ↑	Avg ↑
Methods	1	2	3	4	5	6	7	8	9	10	11	KR ↑	ΔFinal ↑	Avg ↑
Ft-CNN	68.68	43.70	25.05	17.72	18.08	16.95	15.10	10.60	8.93	8.93	8.47	12.33	+51.99	22.02
NCM [41]	68.68	57.12	44.21	28.78	26.71	25.66	24.62	21.52	20.12	20.06	19.87	28.93	+40.59	32.49
iCaRL [4]	68.68	52.65	48.61	44.16	36.62	29.52	27.83	26.26	24.01	23.89	21.16	30.80	+39.30	36.67
EEIL [40]	68.68	53.63	47.91	44.20	36.30	27.46	25.93	24.70	23.95	24.13	22.11	32.19	+38.35	36.27
TOPIC [6]	68.68	62.49	54.81	49.99	45.25	41.40	38.35	35.36	32.22	28.31	26.28	38.26	+34.18	43.92
SPPR [15]	68.68	61.85	57.43	52.68	50.19	46.88	44.65	43.07	40.17	39.63	37.33	54.35	+23.13	49.32
D-DeepEMD [44]	75.35	70.69	66.68	62.34	59.76	56.54	54.61	52.52	50.73	49.20	47.60	63.17	+12.86	58.73
D-NegCosine [42]	74.96	70.57	66.62	61.32	60.09	56.06	55.03	52.78	51.50	50.08	48.47	64.66	+11.99	58.86
D-Cosine [43]	75.52	70.95	66.46	61.20	60.86	56.88	55.40	53.49	51.94	50.93	49.31	65.29	+11.15	59.36
CEC [7]	75.80	71.94	68.50	63.50	62.43	58.27	57.73	55.81	54.83	53.52	52.28	68.97	+8.18	61.33
MetaFSCIL [11]	75.90	72.41	68.78	64.78	62.96	59.99	58.30	56.85	54.78	53.83	52.64	69.35	+7.19	61.93
FACT * [9]	77.38	73.91	70.32	65.91	65.02	61.82	61.29	59.53	57.92	57.63	56.46	72.95	+4.80	64.29
GKEAL [45]	78.88	75.62	72.32	68.62	67.23	64.26	62.98	61.89	60.20	59.21	58.67	74.38	+1.79	66.35
GMIE-FSCIL	79.19	76.26	72.54	68.98	68.49	65.86	64.80	63.56	62.26	61.47	60.46	76.35	-	67.62

Table 4. The recall accuracy of GMIE-FSCIL and other methods at the final session on all datasets.

	miniImageNet	CIFAR100	CUB200
CEC [7]	46.34%	48.58%	50.89%
Baseline	48.98%	48.23%	54.92%
GMIE-FSCIL	52.86%	52.62%	59.34%

Table 5. Evaluations of our framework employing different modules on the miniImageNet dataset were conducted.

Baseline	GML	GIP	Accuracy in Each Session (%) ↑									KR	Avg
Baseline	GML	GIP	1	2	3	4	5	6	7	8	9	KR	Avg
√			75.20	69.12	64.71	61.49	58.79	56.13	53.48	51.70	50.34	66.94	60.11
√	√		76.13	70.92	66.69	63.23	60.73	57.88	55.29	53.60	52.27	68.66	61.86
√		√	75.77	71.31	66.86	63.39	60.85	58.05	55.14	53.30	52.43	69.19	61.90
√	√	√	76.56	72.14	67.90	64.32	61.55	58.83	56.18	54.52	53.53	69.92	62.83

Table 6. Evaluations of our framework employing different modules on the CIFAR100 dataset were conducted.

Baseline	GML	GIP	Accuracy in Each Session (%) ↑									KR	Avg
Baseline	GML	GIP	1	2	3	4	5	6	7	8	9	KR	Avg
√			76.57	71.83	67.73	63.76	60.30	57.29	55.08	52.93	50.63	66.12	61.79
√	√		76.94	72.51	68.01	64.11	60.70	57.60	55.59	53.56	51.26	66.62	62.25
√		√	76.88	73.15	68.96	64.92	61.73	58.85	56.70	54.51	52.20	67.89	63.10
√	√	√	77.37	74.20	70.04	66.13	62.84	60.08	58.31	56.17	54.08	69.90	64.36

Table 7. Evaluations of our framework employing different modules on the CUB200 dataset were conducted.

Baseline	GML	GIP	Accuracy in Each Session (%) ↑											KR	Avg
Baseline	GML	GIP	1	2	3	4	5	6	7	8	9	10	11	KR	Avg
√			77.82	74.79	70.47	65.87	66.07	63.07	61.71	59.91	58.91	57.66	56.44	72.53	64.79
√	√		78.45	75.42	72.29	67.64	67.54	64.26	63.45	61.79	60.83	59.82	58.88	75.05	66.40
√		√	78.73	76.01	72.52	67.81	67.63	64.94	64.40	62.33	61.25	60.10	59.15	75.13	66.81
√	√	√	79.19	76.26	72.54	68.98	68.49	65.86	64.80	63.56	62.26	61.47	60.46	76.35	67.62

Table 8. The comparison results between our GMIE-FSCIL and the other methods by fixing the performance of the base model on the CIFAR100 dataset.

Methods	Accuracy in Each Session (%) ↑									KR ↑	ΔFinal ↑	Avg ↑
Methods	1	2	3	4	5	6	7	8	9	KR ↑	ΔFinal ↑	Avg ↑
D-NegCosine [42]	74.36	68.23	62.84	59.24	55.32	52.88	50.86	48.98	46.66	62.73	+5.36	57.71
D-Cosine [43]	74.55	67.43	63.63	59.55	56.11	53.80	51.68	49.67	47.68	63.95	+4.34	58.23
CEC [7]	73.07	68.88	65.26	61.19	58.09	55.57	53.22	51.34	49.14	67.25	+2.88	59.53
MetaFSCIL [11]	74.50	70.10	66.84	62.77	59.48	56.52	54.36	52.56	49.97	67.07	+2.05	60.79
GKEAL [45]	74.01	70.45	67.01	63.08	60.10	57.30	55.50	53.39	51.40	69.45	+0.62	61.36
GMIE-FSCIL	74.65	72.19	67.94	64.19	61.28	58.52	56.47	54.18	52.02	69.69	-	62.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gu, Z.; Lu, Z.; Han, C.; Xu, C. Few Shot Class Incremental Learning via Grassmann Manifold and Information Entropy. Electronics 2023, 12, 4511. https://doi.org/10.3390/electronics12214511

AMA Style

Gu Z, Lu Z, Han C, Xu C. Few Shot Class Incremental Learning via Grassmann Manifold and Information Entropy. Electronics. 2023; 12(21):4511. https://doi.org/10.3390/electronics12214511

Chicago/Turabian Style

Gu, Ziqi, Zihan Lu, Cao Han, and Chunyan Xu. 2023. "Few Shot Class Incremental Learning via Grassmann Manifold and Information Entropy" Electronics 12, no. 21: 4511. https://doi.org/10.3390/electronics12214511

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Few Shot Class Incremental Learning via Grassmann Manifold and Information Entropy

Abstract

1. Introduction

2. Related Work

2.1. Few Shot Class Incremental Learning

2.2. Graph Manifold Embedding

2.3. Information Entropy

3. Materials and Methods

3.1. Problem Definition

3.2. Overview

3.3. Grassmann Metric Learning

3.4. Graph Information Preserving

4. Results

4.1. Dataset

4.2. Implementation Details

4.3. Comparison with State-of-the-Art Methods

4.4. Effectiveness of the Proposed Modules

4.5. Ablation Study

4.5.1. Confusion Matrix of GMIE-FSCIL and Baseline

4.5.2. Visualization Results of GMIE-FSCIL and Baseline

4.5.3. The Shot Number in the Incremental Learning

4.5.4. Different Methods with Similar Starting Performance

4.5.5. Accuracy of the New Classes for Each Incremental Learning Session

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI