1. Introduction
Deep learning (DL) is already ubiquitous in our daily lives, including image-based object detection [
1], face recognition [
2], medical imaging, and healthcare [
3]. While DL is outperforming traditional machine learning methods in these aforementioned application areas [
4], a major downside of DL is that it requires large amounts of data to achieve good performance [
5]. Few-shot learning (FSL) is a subfield of DL that focuses on training DL models under scarce data regimes, thereby opening possibilities for applying DL to new problem areas where the amount of labeled data is limited.
In FSL settings, datasets are comprised of large numbers of categories (i.e., class labels), but only a few examples per class are available. The main objective of FSL is the design of methods that achieve good generalization performance from the limited number of examples per category. The overarching concept of FSL is very general and applies to different data modalities and tasks like image classification [
6], object detection [
7], and text classification [
8]. However, most FSL research is focused on image classification so that we will use the terms
examples and
images (in a supervised learning context) interchangeably.
Most FSL methods use an episodic training strategy known as meta-learning [
9], where a meta-learner is trained on (classification) tasks with the goal to learn to perform well on new, unseen tasks. Many of the most recent FSL methods are based on episodic meta-learning, such as prototypical networks [
10], relation networks [
11], model agnostic meta-learning frameworks [
12], and LSTM-based meta-learning [
13]. Another successful approach to FSL is the use of transfer learning, where models are trained on large datasets and then appropriately transferred to smaller datasets that contain the novel target classes; examples include weight imprinting [
14], dynamic few-shot object recognition with attention modules [
15], and few-shot image classification by predicting parameters from activation values [
16].
Apart from recent developments in FSL, many researchers have recently proposed methods for implementing graph neural networks (GNNs) to extend deep learning approaches for graph-structured data. In this context, graphs are used as data structures for modeling the relationships (edges) between data instances (nodes), which was first proposed via the graph neural network model [
17] and extended via graph convolutional networks [
18], semi-supervised graph convolutional networks [
19], graph attention networks [
20], and message passing neural networks [
21]. Since FSL methods are centered around modeling relationships between the examples in the support and query datasets, GNNs have also gained a growing interest in FSL research, including approaches aggregating node information from densely connected support and query image graphs [
22], transductive inference [
23], and edge-labeling [
24]. GNNs can be computationally prohibitive on large datasets. However, we shall note that one of the significant characteristics of FSL is that datasets for meta-training and meta-testing contain only “few” examples per class, such that the computational cost of graph construction becomes small in FSL.
Previous research has shown that FSL can be improved by incorporating additional information. For instance, unlabeled data, which is used in conventional [
25], self-trained [
26], and transfer learning-based [
27] semi-supervised FSL, could improve the predictive performance of FSL models. Also, FSL benefits from the inclusion of additional modalities (e.g., textual information describing the images to be classified), which was demonstrated via an adaptive cross-model approach enhancing metric-based FSL [
28] as well as cross-modal FSL utilizing latent features from aligned autoencoders [
29]. While the aforementioned works showed that additional
external information benefits FSL, we raise the question of whether additional
internal information can be useful as well.
While the incorporation of additional information can be beneficial, the utilization of additional
internal information is not very common in FSL research, and only two recent research papers explored this approach, i.e., Li et al.’s deep nearest neighbor neural network [
30] and the dense classification network by Lifchitz et al. [
31]. In these works, the researchers expanded the feature embeddings (the low-dimensional representation) of the data inputs (i.e., images), extracted from the last layer in the neural network, to higher-dimensional embeddings. These higher-dimensional embeddings were split into several smaller vectors, such that multiple embedding vectors corresponded to the same image. In the DN4 model proposed by Li et al. [
30], the last layer’s feature embeddings were expanded to form many local descriptors. The dense classification network by Lifchitz et al. [
31] expanded the feature embeddings to three separate vectors that are used for computing the cross-entropy loss during training.
When it comes to utilizing additional internal information, both DN4 [
30] and the dense classification network [
31] only considered the last layer’s information. In contrast to existing work on FSL, we consider additional information that is hidden in the earlier layers of the neural network. We hypothesize that such internal information benefits an FSL model’s predictive performance. More specifically, the extra information hidden in the network considered in this work is comprised of the feature embeddings that can be obtained from layers before the last layer. We propose using a graph structure to integrate this lower-level information into the neural network, since graph structures are well-suited for modeling relationships in data.
We refer to the FSL method proposed in this paper as
Looking-Back, because unlike DN4 [
30] and the dense classification network [
31], this method fully utilizes previous layers’ feature embeddings (i.e., lower-level information) rather than focusing on the final layer’s feature embeddings alone. During training, the lower-level information is expected to help the meta-learner to absorb more information overall. Although this lower-level information may not be as useful as the embedding vectors obtained from the last layer, we hypothesize that the lower-level information has a positive impact on the meta-learner. To test this hypothesis, we adopt the widely used Conv-64F [
30] in few-shot learning as a backbone, and construct graphs for label propagation, following the transductive propagation network (TPN) [
23], to capture lower-level information.
Besides the feature embeddings of the last layer, the previous layers’ feature embeddings are also used for computing the pair-wise similarities between the inputs, based on relational network modules [
23]. In the Looking-Back method, three groups of pair-wise similarity measures are computed. The similarity scores between all support and query images in one episode amount to three separate graph Laplacians, which are used for iterative label propagation, to generate three separate cross-entropy losses. As the experimental results indicate, the losses from lower-level features are used during meta-training to enhance the performance of the meta-learner. After meta-training, we adopt the last layer’s feature embeddings for testing on new tasks (i.e., images with class labels that are not seen during training) in a transductive fashion. As the experimental results reveal, the resulting FSL models have a better predictive performance on new, unseen tasks compared to models generated by meta-learners that do not utilize lower-level information.
The contributions of this work can be summarized as follows:
We propose a novel FSL meta-learning method, Looking-Back, that utilizes lower-level information from hidden layers, which is different from existing FSL methods that only use feature embedding of the last layer during meta-training.
We implement our Looking-Back method using a graph neural network, which fully utilizes the advantage of graph structures for few-shot learning to absorb the lower-level information in the hidden layers of the neural network.
We evaluate our proposed Looking-Back method on two popular FSL datasets, miniImageNet and tieredImageNet, and achieve new state-of-the-art results, providing supporting evidence that using lower-level information could result in better meta-learners in FSL tasks.