A Semantic-Preserving Deep Hashing Model for Multi-Label Remote Sensing Image Retrieval

Cheng, Qimin; Huang, Haiyan; Ye, Lan; Fu, Peng; Gan, Deqiao; Zhou, Yuzhuo

doi:10.3390/rs13244965

Open AccessArticle

A Semantic-Preserving Deep Hashing Model for Multi-Label Remote Sensing Image Retrieval

by

Qimin Cheng

^1,*,

Haiyan Huang

¹

,

Lan Ye

²,

Peng Fu

³,

Deqiao Gan

¹ and

Yuzhuo Zhou

¹

School of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

²

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

³

Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(24), 4965; https://doi.org/10.3390/rs13244965

Submission received: 3 November 2021 / Revised: 2 December 2021 / Accepted: 3 December 2021 / Published: 7 December 2021

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Conventional remote sensing image retrieval (RSIR) systems perform single-label retrieval with a single label to represent the most dominant semantic content for an image. Improved spatial resolution dramatically boosts the remote sensing image scene complexity, as a remote sensing image always contains multiple categories of surface features. In this case, a single label cannot comprehensively describe the semantic content of a complex remote sensing image scene and therefore results in poor retrieval performance in practical applications. As a result, researchers have begun to pay attention to multi-label image retrieval. However, in the era of massive remote sensing data, how to increase retrieval efficiency and reduce feature storage while preserving semantic information remains unsolved. Considering the powerful capability of hashing learning in overcoming the curse of dimensionality caused by high-dimensional image representation in Approximate Nearest Neighbor (ANN) search problems, we propose a new semantic-preserving deep hashing model for multi-label remote sensing image retrieval. Our model consists of three main components: (1) a convolutional neural network to extract image features; (2) a hash layer to generate binary codes; (3) a new loss function to better maintain the multi-label semantic information of hash learning contained in context remote sensing image scene. As far as we know, this is the first attempt to apply deep hashing into the multi-label remote sensing image retrieval. Experimental results indicate the effectiveness and promising of the introduction of hashing methods in the multi-label remote sensing image retrieval.

Keywords:

multi-label; deep hashing; remote sensing image retrieval

1. Introduction

With the rapid development of Earth observation technology, the volume of remote sensing data has increased exponentially [1,2,3]. Remote sensing images have broad applications such as early warning for natural disasters, emergency response, and urban construction planning [4,5]. It is a challenging task to quickly obtain remote sensing images of interest from a pool of massive remote sensing images, stimulating extensive research interests among the scientific community [6,7].

Early text-based remote sensing image retrieval (TBRSIR) performed retrieval based on pre-annotated textural information like acquisition time, resolution, and sensor type. Considering the tedious and burdensome manpower requirements and ambiguous annotation derived from text-based retrieval, researchers have turned to the content-based remote sensing image retrieval (CBRSIR). The CBRSIR is always composed of two main parts: feature extraction and similarity measurement. Due to the rapid advances of artificial intelligence technology and computer hardware, obtaining deep features through convolutional neural networks has become a substitute for traditional hand-crafted features. It has been proved that convolutional neural networks, pre-trained or finetuned, can greatly improve the retrieval performance of remote sensing images [8,9,10,11,12].

At present, most of the CBRSIR methods are based on a single label. For simple scenes, such as forests and chaparral, a single label is enough to obtain excellent retrieval results, but for complex scenes, such as sparse residential areas, a single label cannot distinguish complex categories within an image. Remote sensing images are usually composed of multiple categories. In this case, a single label, which represents the most significant semantic content of a remote sensing image, might ignore the complex and abundant information contained in that image. As shown in Figure 1, we select two images from the single label dataset UCMD and the multi-label dataset MLRSD as an example, it is difficult to accurately describe complex remote sensing image scenes using a single label. Therefore, in recent years, in order to overcome the limitations of single-label remote sensing image retrieval, efforts have been made towards multi-label retrieval of remote sensing images [13,14,15,16,17,18], and it is suggested that multi-label retrieval dramatically outperforms single-label retrieval.

However, in the current DCNN-based remote sensing image retrieval architecture, the feature dimensions of the remote sensing images is usually very high [12]. The larger the dataset, the greater the memory cost and the longer the time required to search images. For example, 10 million images with a 512-dimensional feature vector require about 20 GB of memory, which is very unfavorable for massive remote sensing image retrieval. In addition, the time-consuming calculation among feature vectors of the image makes it difficult to meet the real-time retrieval requirements in actual applications. Thus, how to achieve real-time retrieval of remote sensing images in the era of massive remote sensing data is still challenging.

Recently, hashing methods have become increasingly attractive in the task of Approximate Nearest Neighbors (ANN) search problems [19]. Through mapping high-dimensional image feature vector compact binary hash codes into Hamming space and calculating Hamming distance by simple bit operation and XOR operation, hashing methods can save calculation time and reduce memory consumption to a great degree. For example, 10 million pictures with 512-dimensional feature vectors using 128-bit hash code only need 160 MB for storage. Due to their advantages of simple structure, low space cost, and flexibility to expand, hashing methods have been widely used for fast large-scale image retrieval. To solve the problem of semantic gap resulted from the process of encoding the high-dimensional data to binary bits, researchers have proposed strategies to maintain semantic similarity. For example, in the natural image field [20], Zhang et al. used an optimized loss function to preserve semantic similarity. However, in the remote sensing field, almost all existing deep hashing networks are based on single-label learning, which is not able to adapt to complex remote sensing scenes. In a bit more detail, remote sensing images are usually composed of multiple land categories, so that a single label describing the most significant semantic content might ignore the complex and abundant information contained in the image. To tackle this dilemma, we attempt to introduce multi-label supervision into the deep hashing framework. Furthermore, we propose pair-wise label similarity loss to fully exploit the multi-label semantic information, including the hard similarity loss represented by cross entropy and soft similarity loss measured by mean square error. Specifically, the hard similarity loss accounts for completely similar or dissimilar sample pairs, while the soft similarity loss considers partially similar sample pairs, which together encourage the deep hashing model to preserve the semantic consistency between the input paired image samples. The purpose of our research is to improve efficiency and reduce storage of multi-label image retrieval without losing accuracy. The main contributions of this paper can be summarized as follows:

(1) We propose a semantic-preserving deep hashing model for multi-label remote sensing image retrieval. As far as we know, this is the first attempt to introduce hashing methods into multi-label remote sensing image retrieval.

(2) We propose paired label similarity loss to make full use of multi-label semantic information, including hard similarity loss represented by cross-entropy and soft similarity loss measured by mean square error. Specifically, the hard similarity loss considers completely similar or dissimilar sample pairs, while the soft similarity loss considers partially similar sample pairs. Together, they encourage the deep hash model to maintain semantic consistency between the input paired image samples.

We conduct comparative experiments with other baseline models, including the IDHN Model [21], the improved ISDH model [22], the DMSH model [23], the DHN model [24], and the DPSH model [25], to assess the effectiveness of our proposed model.

The remaining part of this paper is organized as follows: in Section 2, related work on the multi-label retrieval of natural image, multi-label retrieval of remote sensing images and image retrieval based on deep hashing is summarized and analyzed. In Section 3, the system architecture of our model is described in detail, and the design of loss function based on deep hashing network is emphasized. In Section 4, comparative experiments are conducted to demonstrate the superiority of our modal. In Section 5, discussions and conclusions are presented, followed by future directions toward which further efforts can be made.

2. Related Work

2.1. Multi-Label Retrieval of Natural Images

In the field of multi-label retrieval of natural images, Li et al. [26] proposed a Multi-Label nearest neighbor propagation method for region-based image retrieval. Nasierding et al. [27] studied the application of multi-label classification in image annotation and retrieval and made a comprehensive comparison of these methods. Li et al. [28] proposed a new multi-label image annotation method for image retrieval based on annotated keywords. The experimental results show that multi-label can better describe the image content at the semantic level. Ranjan et al. [29] proposed a multi-label canonical correlation analysis algorithm to solve the cross-modal image retrieval problem of multi-label annotation. In addition, multi-label image retrieval based on hash algorithms has also received attention from researchers. For example, Lai et al. [30] studied multi-label image retrieval based on instance hashing. Zhang et al. introduced a quantified definition to pairwise similarity in multi-label image retrieval to achieve an effective feature learning and hash-code learning [21]. In addition, recently, Rodriguez et al. [31] made a survey of hash learning methods for multi-label image retrieval. The achievements of multi-label retrieval of natural images have inspired researchers from other fields such as remote sensing.

2.2. Multi-Label Retrieval of Remote Sensing Images

Multi-label labeling of remote sensing images is a very time-consuming and labor-intensive task and requires certain professional knowledge, which results in the lack of large-scale multi-label remote sensing datasets and limits the research on multi-label remote sensing image retrieval.

In [13], Chaudhuri B et al. made a multi-label remote sensing dataset based on the UCM remote sensing image dataset available publicly and proposed a semi-supervised graph theory method for the multi-label remote sensing image retrieval. Dai et al. [14,15] combined spectral and spatial description features to design a supervised multi-label remote sensing image retrieval method using sparse reconstruction and evaluated the effectiveness of the proposed method with a new label likelihood measure. Although the work mentioned above achieved good performance compared with single-label retrieval of remote sensing images, they relied on hand-crafted features for retrieval. Recently, Shao et al. [16] extended the 17 categories of MLRSD to densely label the UCM image library to generate a Dense Labeling Remote Sensing Dataset (DLRSD) for more tasks besides image retrieval, such as semantic segmentation. They evaluated the performance of hand-crafted features and CNN features in remote sensing image retrieval based on MLRSD; furthermore, Shao et al. [17] proposed a novel multi-label approach based on FCN, which is the first attempt of introducing deep learning into multi-label retrieval task. J. Kang et al. [18] proposed a new graph relation network (GRN) for multi-label RS scene classification, which can use a graph structure input to network learning to model the relationship between scenes. Recently, Qi et al. [32] constructed a multi-label high-spatial-resolution remote sensing dataset named MLRSNet for semantic scene understanding for the task of image classification and image retrieval. Sumbul et al. [33] proposed BigEarthNet-MM dataset which is made up of 590,326 pairs of Sentinel-1 and Sentinel-2 image patches. Sumbul et al. [34] proposed a novel triplet sampling method in remote sensing image retrieval by selecting informative triplets, thus making the model learn accurately, and proposed a novel learning strategy to train a deep neural network to automatically predict a graph structure of remote sensing images in [35].

All of the work mentioned above shows that the use of deep learning for multi-label retrieval of remote sensing images can improve the retrieval performance effectively. However, these approaches have not paid much attention to the issue of efficiency and memory cost, which prevents the application for massive remote sensing image data in practical scenarios.

2.3. Image Retrieval Based on Deep Hashing

Recently, some researchers have proposed deep hashing-based image retrieval methods [36,37,38] to evaluate the effectiveness of integrating deep learning with hashing learning. The existing hash learning methods can be divided into unsupervised and supervised hash learning. Typical unsupervised hash methods include Locality Sensitive Hashing (LSH) [39], Iterative Quantization (ITQ) [40], Semantic Hashing (SH) [41], and Discrete Graph Hashing (DGH) [42]. Compared with unsupervised hashing methods, which only use the distribution information in image data for learning the hash function and do not use the label data of the image, supervised hash learning methods use supervised information in the hash learning, which improves the accuracy of the hash algorithm.

A variety of supervised hash learning methods for image retrieval have been proposed. In 2014, Xia et al. [43] first proposed a staged convolutional neural network hashing method (CNNH) for which deep learning was integrated into the hashing method. Compared with hand-crafted features, the performance of learning features was significantly improved. Subsequently, Li et al. [25] proposed Deep Pairwise Supervised Hash (DPSH) to implement end-to-end hash learning. Liu et al. [44] proposed a deep-supervised hash model (GAAH) for remote sensing image retrieval under the framework of Generating Adversarial Networks (GAN). To address the problem of limited labeled data for remote sensing images, Roy et al. [45] proposed a deep hash network based on metric learning (MiLaN). In the above methods, they do not really make use of multi-label information. More specifically, in these methods, when two images have a common label, their similarity is set to 1, otherwise it is set to 0. Given this fact, Zhao et al. [46] proposed the Deep Semantic Sorting Hashing Method (DSRH), which returns similar images in a ranked order. Ranjan et al. [29] presented a multi-label cross-modal retrieval system by introducing multi-label Canonical Correlation Analysis (ml-CCA). Wu et al. [47] proposed a deep uniqueness-aware hashing method, which added a multi-label classification layer after the hash layer to make the learned hash code more discriminate. All of the studies mentioned demonstrate the effectiveness of hashing learning methods in CNN-based image retrieval. Inspired by these works, we combine the advantage of hash learning and deep learning to multi-label retrieval of remote sensing images, remote sensing images are usually composed of multiple land categories, so that a single label describing the most significant semantic content might ignore the complex and abundant information contained in the image, we proposed a semantic-preserving deep hashing model to verify and evaluate the performance of deep hashing-based multi-label remote sensing image retrieval.

3. A Semantic-Preserving Deep Hashing Model

In this section, firstly we introduce the basic idea of deep hashing-based multi-label image retrieval in Section 3.1, and then we describe the system architecture and retrieval process of our multi-label image retrieval model in detail, while we focus on the design of loss function in Section 3.3.

3.1. The Basic Idea of Deep Hashing Based Multi-Label Image Retrieval

The basic idea of deep hashing-based multi-label image retrieval is to learn a feature mapping function by using the similarity information of the multi-label remote sensing image and then to map the input image to a length of q hash code. Then, the obtained hash code is used to calculate the hamming distance to measure the similarity between remote sensing images. Among them, the image similarity supervision information

S = S_{i j} {i, j = 1, 2, \dots, n}

required for deep hashing can be calculated by the cosine distance between the multi-label vectors. The cosine similarity between remote sensing image

I_{i}

and remote sensing image

I_{j}

is defined as follows:

S_{i j} = \frac{〈 L_{i}, L_{j} 〉}{| | L_{i} | | • | | L_{j} | |}

(1)

In Equation (1),

L_{i}

,

L_{j}

refer to the label vector of multi-label remote sensing image, respectively;

〈 L_{i}, L_{j} 〉

refers to the inner product of the vector

L_{i}

and the vector

L_{j}

, and its value is equal to

L_{i}^{T} L_{j}

;

| | | |

represents the L2 normal form operation. According to Formula (1), it can be seen that there are three similarities in paired multi-label remote sensing images: completely similar (

S_{i j} = 1

), similar parts (

0 < S_{i j} < 1

), dissimilar (

S_{i j} = 0

). For an ANN search task, the hash code

B = {b_{i} | b_{i} \in {- 1, 1}^{q}}_{i = 1}^{n} = 1

learned by deep hashing is required to retain the similarity of the paired images. The similarity value of paired hash code

b_{i}

and

b_{j}

is inversely propotional to the hamming distance. That is to say, in the case of a given paired hash code

b_{i}

and

b_{j}

,if

S_{i j} = 1

, the hamming distance of the

b_{i}

and

b_{j}

should be as small as possible, preferably close to 0; if

S_{i j} = 0

, the hamming distance of the

b_{i}

and

b_{j}

should be as large as possible, the best approaching

q

; if

0 < S_{i j} < 1

, the hamming distance of

b_{i}

and

b_{j}

should be between the minimum distance and the maximum distance.

3.2. System Architecture of Deep Hashing Model

The whole system architecture of our model is shown in Figure 2, and is mainly composed of two parts, namely the deep feature extraction module and the hash learning module. While the deep feature extraction module is responsible for generating high-level and abstract image representation through multi-level architecture, the hash learning module is responsible for converting each image to a binary space in a binary sequence. To preserve the similarity information of multi-label remote sensing image pairs and control hashing quality, we improve the pair-wise similarity loss and use quantization loss to limit the output value range of the hash network.

The training process of our model is: the paired image is fed into the model, the high-dimensional features of multi-label remote sensing images are extracted through multi-layer convolution and two fully connected layers, and then the output is fed to a hash layer that connects the fully connected layer FC1 and the fully connected layer FC2 to generate a hash code of q length. Then, the image similarity is used as supervision information to train the model in an end-to-end manner. In the retrieval process, the multi-label remote sensing image is encoded into a binary code, and then the distance between the binary code of the query image and other images is calculated, and finally the ranked search results are returned.

We choose AlexNet as the feature extraction backbone network. We also use VGG16 network to extract image feature to show the scalability of our model. To make full use of the deep features extracted by the first fully connected layer, we further input the first fully connected layer into the hash layer. Table 1 gives main parameters of our deep hashing multi-label retrieval model, in which Conv_i represents the i-th convolutional layer, Maxpool_i represents the i-th pooling layer, and Fc_i represents the i-th layer is a fully connected layer. In our model, a hash layer with q bits output is exploited to replace the classification layer in the AlexNet network. In addition, in order to reduce the possible loss of semantic information and make full use of the deep features extracted by the first fully connected layer, the first fully connected layer Fc_6 is connected to the hash layer Fc_8.

To obtain the binary hash code, we use the symbolic function to quantify the unified features and transform the output of the deep hashing network. The formula for generating a binary hash code is as follows:

b_{i_k} = sgn (u_{i_k})

(2)

In Formula (2), when the k-th element of the vector of

u_{i}

is greater than 0, the k-th element of the vector of

b_{i}

is 1, otherwise it is 0. In this way, the deep hash network can directly encode remote sensing images and get the corresponding binary code.

3.3. Design of Loss Function

The design of loss function directly affects the quality of the hash code. In our model, the loss function is composed of a paired similarity loss function and a quantization loss.

(1): Paired similarity loss

Based on the three kinds of similarity, the similarity between image pairs can be divided into hard similarity

L_{h a r d}

and soft similarity

L_{s o f t}

. This means that, when the image

I_{i}

and the image

I_{j}

are completely similar or dissimilar (

S_{i j}

= 1 or

S_{i j} =

0), the similarity of the image pairs (

I_{i}

,

I_{j}

) belongs to hard similarity and cross entropy loss is suitable for loss calculation. Similarly, when the image

I_{i}

and the image

I_{j}

are partially similar (

0 < S_{i j}

< 1), the similarity of the image pair belongs to soft similarity and the mean square error is used to represent the loss. In recent years, many deep hashing methods have mostly performed definition in a hard-allocation manner. That is, if they share no less than one class label, the pairwise similarity is “1”, and if they do not share any class label, it is “0”. However, this definition of similarity does not reflect the similarity ranking of paired images containing multiple labels. To express the hard similarity and soft similarity uniformly, we introduce parameters

C_{i j}

to balance the range of the two values in designing the loss function;

C_{i j}

= 1 refers to cross entropy loss and

C_{i j}

= 0 refers to mean square error loss. Since the value range of

L_{s o f t}

is 2q times the value range of

L_{h a r d}

, we introduce hyperparameters

β

to balance the range of them. Inspired by [39], in order to preserve the semantic similarity of the paired multi-label remote sensing images in the Hamming space, we constructed an inner product expression

δ_{i j} = φ 〈 b_{i}, b_{j} 〉

based on the hash code and applied it to the construction of the deep hashing model in the loss function of the pairwise similarity. In addition, we use the similarity

S_{i j}

of paired images to weight the paired similarity loss, expressed as

e^{α s_{i j}}

, where

α

is the adjustment parameter. Therefore, our paired similarity loss calculation formula is as follows:

\begin{matrix} L_{s i m i l a r i t y} & = e^{α s_{i j}} (C_{i j} L_{h a r d} + β (1 - C_{i j}) L_{s o f t}) \\ = e^{α s_{i j}} \sum_{s_{i j} \in S} [C_{i j} (\log (1 + e^{δ_{i j}}) - s_{i j} δ_{i j}) + β (1 - C_{i j}) {(\frac{b_{i}^{Τ} b_{j} + q}{2} - s_{i j} \cdot q)}^{2}] \end{matrix}

(3)

In Formula (3),

b_{i} \in {- 1, 1}^{q}

is a hyperparameter that controls the range of the inner product value and prevents gradient disappearance effect caused by the inner product value of the input sigmoid function being too small. We take

φ = 0.5

, that is

δ_{i j} = 0.5 b_{i}^{T} b_{j}

.

(2): Quantization loss

For the binary representation of hash codes, most existing hash algorithms use activation functions like sigmoid or tanh to restrict the output value range of the hash layer and quantize the loss function to constrain the network output value to be nearby. However, the sigmoid/tanh activation function is saturated. The closer the activation output value is closer to

\pm 1

, the smaller the gradient value passed to the bottom layer by the model training is, which is more likely to result in gradient disappearance. We use a more relaxed activation function (softsign function)

f (x) = x / (1 + | x |)

to limit the output of the hash layer to within the range of (−1, 1). We replace the binary code

b_{i}

with the network output value

u_{i}

and employ the quantization loss to constrain the value of each element of

u_{i}

in the near the discrete value

\pm 1

. Therefore,

δ_{i j}

is redefined as

0.5 u_{i}, u_{j} = 0.5 〈 u_{i}^{T} u_{j} 〉

, the paired quantization loss

L_{Q}

can be defined as:

L_{Q} = \sum_{i, j \in n} ({‖ | u_{i} | - 1 ‖}_{1} + {‖ | u_{j} | - 1 ‖}_{1})

(4)

Here,

{| | | |}_{1}

is the L1 normal form operation,

| |

is the absolute value operation, and 1 is a vector with elements of the same scale and size as

u_{i}

. The total loss function of the deep hash model can be defined as:

\begin{matrix} L & = e^{α s_{i j}} (C_{i j} L_{h a r d} + β (1 - C_{i j}) L_{s o f t}) + γ L_{Q} \\ = \sum_{i, j \in n} e^{α s_{i j}} [C_{i j} (\log (1 + e^{δ_{i j}}) - s_{i j} δ_{i j}) + β (1 - C_{i j}) {(\frac{u_{i}^{Τ} u_{j} + q}{2} - s_{i j} q)}^{2}] \\ + γ ({‖ | u_{i} | - 1 ‖}_{1} + {‖ | u_{j} | - 1 ‖}_{1}) \end{matrix}

(5)

In a nutshell, our model extracts image features through a convolutional neural network and generates binary codes through a hash layer; finally, we design a new loss function to better maintain the multi-label semantic information of hash learning contained in complex remote sensing image scene so as to improve retrieval performance.

4. Experiments and Analysis

4.1. Dataset and Evaluation Metrics

4.1.1. Dataset

The first dataset we used was DLRSD (Dense Labeling Remote Sensing Dataset) [16]. The label data of each image in the DLRSD image library was a segmentation image. We analyzed the segmentation map to extract the multi-label of the image. DLRSD has richer annotation information, as illustrated in Figure 3. Table 2 shows the 17 categories and corresponding label IDs of the DLRSD image dataset. The UCMerced dataset is the second dataset used in our experiment. Compared with DLRSD, UCMerced dataset containes 17 categories, and each sample number in each category is 100, it is annotated by Chaudhuri et al. [13]. The predecessor of the image dataset UCMerced is the UCM remote sensing image dataset consisting of 2100 images with a spatial resolution of 30 cm/pixel, an image size of 256 × 256. Table 3 shows the categories and corresponding label IDs of the UCMerced dataset.

4.1.2. Evaluation Metrics

Metrics used to evaluate the performance of the multi-label image retrieval model in our experiments include Accuracy, Recall, Precision, F1-measure and Hamming Loss. Hamming loss directly counts the number of incorrect labels. For these five evaluation metrics, the smaller the value of HL, and the larger the value of Accuracy, Precision, Recall and F1, the better the model is. Among them, F1 comprehensively considers the two metrics of Precision and Recall.

4.2. Experimental Setup

We use pytorch and Adam optimizer to train our network. The GPU and CPU configuration of the computer are NVIDIA GRID M60-8Q and quad-core Intel(R) Xeon(R) CPU E5-2687W v4, respectively. A total of 80% of the images are used as the training set, and the remaining 20% images are used for validating the retrieval performance. We use pre-training weights to fine-tune the weight parameters of Conv_1 to Conv_5, Fc_6, and Fc_7, and set the initial learning rate to 10⁻⁵. Hash layer Fc_8 sets the initial learning rate to 10⁻³. In the selection of parameters, commonly used values were set for multi-label retrieval. The model training batch size is set to 48, the learning rate decreases by 0.5 every 500 iterations, and the total training epoch is 80.

4.3. Parameter Analysis

In the deep hashing model,

α, β, γ

are three important hyperparameters, which heavily affect the performance of the model. Among them,

α

is the value of the weighting coefficient that affects the pairwise similarity loss;

β

is used to adjust the difference in the range of mean square error and cross entropy loss;

γ

is the hyperparameter that controls the quantization loss.

Of these three parameters,

β

has the greatest impact on metric learning, so we first set

γ = 1.0

and

α = 0.5

; which are the most commonly used values in hashing learning methods for exploring the impact of hyperparameters on the retrieval performance of the deep hash model. Table 4 lists the retrieval results when

β

are {0, 0.1/q, 0.5/q, 1.0/q, 1.2/q, 5.0/q}, and here q means the length q of the hash code.

From the retrieval results shown in Table 4, it can be seen that when there is no paired soft similarity loss (

β = 0

), the retrieval results of multi-label remote sensing images are the worst. This finding demonstrates that the paired soft similarity loss has a positive effect on multi-label remote sensing image retrieval. At the same time, when

β = 1.0 / q

, the retrieval results are the best. Therefore, in subsequent experiments,

β

are set as

1.0 / q

.

To explore the impact of hyperparameters

α

on the multi-label remote sensing image retrieval performance of the proposed model, we then take

α = {0, 0.1, 0.5, 1.0}

. The experimental results are shown in Table 5. Through the retrieval results shown in Table 5, the performance of the deep hash model is optimal when

α = 0.5

. Therefore, in the deep hashing model we proposed, the hyperparameter

α

in the loss function takes a value of 0.5.

In the case of

β = 1.0 / q

and

α = 0.5

, the value ranges of the paired hard similarity loss and the paired soft similarity loss are (0,

0.5 \sqrt{e} \times q

) and [0,

\sqrt{e \times q}

) respectively. At the same time, the role of quantization loss in our deep hashing model is to constrain the network output value range and has a slight impact on the objective function. So we take

γ

= {0, 0.01, 0.1, 1.0} to determine the value of the hyperparameter

γ

. Table 6 lists the retrieval results of the deep hashing model under different

γ

. It can be seen from the experimental results that the retrieval performance of the model is the best when the value

γ

is 1.0.

4.4. Experimental Results

4.4.1. Comparison with Baseline Models

To evaluate the effectiveness of the loss function designed in our model, we first verify our model without the fully connected jump connection, and we mark it DHMR/fc. Similarly, we mark the whole model htat incorporates the output feature of the first fully connected layer as DHMR. We compared DHMR/fc and DHMR with current representative multi-label hash methods in natural image retrieval, including IDHN Model [21], improved ISDH model [22] (in the experiments, the network is marked as Im-ISDH), DMSH model [23], DHN model [24] and the DPSH model [25]. For fair comparisons, all deep hashing models use AlexNet as the base network, while keeping the parameters of network training consistent. Furthermore, we also extended the model DHMR to the VGG16 network to explore the scalability and compare our method to the state-of-art method in remote sensing image multi-label retrieval.

It can be seen from the sub-graphs in Figure 4 that the accuracy of DHMR/fc is significantly higher than that of other models. This demonstrates the effectiveness of our designed loss function. In Figure 4, DHMR exhibited a better performance compared with other models when the number of hash output bits are {12, 24, 36, 48, 56, 64, 72, 84, 96}. It can be seen from Figure 4a–d that DHMR shows the best retrieval performance under the evaluation metrics: Accuracy, Precision, Recall and F1. This result suggests that connecting the first fully connected layer to the hash layer has a positive effect on the retrieval performance. In addition, with the increase of the number of hash codes, the retrieval performance of each deep hashing algorithm has been improved. This shows that the greater the number of output hash codes, the more information of the original image is contained.

Moreover, among the comparative models, the DMSH model shows the worst retrieval performance, which might be that this model uses L2 paradigm to measure the distance between hash codes and the similarity between blurred image pairs. The reason the Im-ISDH model performs unsatisfying is the unbalanced ratio of the large hard similarity cross-entropy loss and the small soft similarity mean square error loss, which is not conducive to the training of model.

Further comparison of the retrieval performance of different models when bits = 48 and bits = 96 is shown in Table 7. It can be seen that when bits = 48, DHMR shows the best retrieval performance. HL, Accuracy, Precision, Recall and F1 are 0.0779, 0.7354, 0.8245, 0.8452, 0.8197, respectively, which is superior to all of the other five models and DHMR/fc.

Besides, it can be noticed that the performance when bits = 96 is superior to the case when bits = 48. This can be explained by the fact when the bit increased, the image content features are more specific and comprehensive. When bits = 96, DHMR also shows the best retrieval performance. HL, Accuracy, Precision, Recall, and F1 are 0.0761, 0.7413, 0.8303, 0.8505, 0.8248, respectively.

4.4.2. Performance on Different Settings and Comparison with SOTA Model

(1) performance comparision of different settings in DLRSD

We extend the proposed model to VGG16 network to explore its scalability. The comparative models selected in this study include IDHN and MSRCF. The reason we choose IDHN to make comparison is that it has the best performance among the comparative models mentioned above. MSRCF was proposed by Shao et al. in [17]. They used FCN to extract the segmentation of each image as multi-label vectors. Table 8 shows the experimental results when bits = 36 and 48. Table 8 shows that by using VGG16 as the backbone network, the performance of DHMR has been significantly improved with 0.5%, 1.2%, 1.3%, 0.5%, 0.9% in HL, Accuracy, Precision, Recall, and F1 when bits = 48, and DHMR is better than IDHN and DHMR/fc. Furthermore, it can be seen from Table 8 that our model is superior to MSRCF, evidenced by increases in HL, Accuracy, Precision, and F1 by 3.26%, 2.76%, 3.78%, 1.4% when bits = 48. These results indicate the effectiveness of combining deep learning and hash learning for multi-label remote sensing image retrieval.

(2) Performance comparision of different settings in UCMerced dataset

Table 9 shows the results on the UCMerced data set. From Table 9, we can see that our model DHMR is better than the IDHN model that performed best in the previous experiments, because we add a similarity weighted loss function to better use the similarity between the images. In addition, DHMR is better than DHMR/fc, which shows the effectiveness of the design of jump connection. On the whole, the performance of the model improves with the increase of bits, that is, when bit = 48, the performance is better than bit = 36. This means that the more hash code bits, the more specific the description of the image content characteristics. Finally, we compared with the latest model DAS-RHDIS, which is a method of selecting a representative triplet from multi-label training images [34]. Our model has shown its advantages in the evaluation metrics of Accuracy, Precision, Recall and F1, which have improved 15.96%,17.48%,13.78% and 15.69%. This may be due to the loss function we designed can better maintain the semantic similarity information.

4.4.3. Multi-Label Image Retrieval Instances

To visually show the multi-label remote sensing image retrieval results of different models, we provide the top 5 images returned by DPSH, DHN, DMSH, Im-ISDH, IDHN, DHMR/fc and DHMR from the DLRSD dataset. Figure 5 and Figure 6 show some retrieval examples on three categories including dense residential and sparse residential.

Each row in Figure 5 and Figure 6 represents the top five retrieval images of different models including DPSH, DHN, DMSH, Im-ISDH, IDHN, DHMR/fc, and DHMR model. The left image in each row is the query image. The title on the bottom of the image is the feature category (multi-label true value) contained in the image. The concrete information about categories and label ID can be referred to Table 2.

In Figure 5 and Figure 6, we choose two complex scenarios, i.e., dense residential and sparse residential, as query images. Search results returned by DHMR and DHMR/fc are more similar to the query image than other models. For instance, in Figure 5, the fourth image returned by DMSH belongs to tennis court; in Figure 6, the third image returned by Im-ISDH belongs to storage tanks and the fourth image returned by IDHN belongs to baseball diamond.

5. Conclusions and Prospects

In this paper, we propose a semantic-preserving deep hashing model for multi-label remote sensing image retrieval. Deep learning and hash learning are integrated for improving efficiency and reducing storage in the complex remote sensing images retrieval without losing accuracy, which is of critical importance in the era of big data.

In our model, first we use convolutional neural networks to extract image features, and then jump-connect the first fully connected layer to the hash layer to fully mine the multi-label semantic information contained in remote sensing images. The experimental results demonstrate the effectiveness and superiority of our model. Our attempt to introduce hash learning into the research of multi-label retrieval of remote sensing images is conducive for real-time environmental applications such as environmental detection, emergency response and many other different fields.

Our future work will be directed to applying graph neural network into our hash model to mine the semantic relationship between multi-label of remote sensing images to further improve performance of multi-label remote sensing images retrieval.

Author Contributions

Conceptualization, Q.C. and H.H.; Methodology, L.Y.; Software, H.H.; Verification, L.Y.; Formal Analysis, D.G.; Investigation, Y.Z.; Resources, Quality Control, H.H.; Data Management, L.Y.; Writing—Manuscript Preparation, H.H.; Writing—Review and editing, Q.C.; visualization, L.Y.; supervision, P.F.; project management, Q.C.; fund acquisition, Q.C. All authors have read and agreed to the published version of the manuscript.

Funding

The work of this paper was supported by the National Key R&D Program of China (2018YFB0505401), National Natural Science Foundation of China (No. 41771452), and Director Fund of Institute of Remote Sensing and Digital Earth (Y5SJ1500CX).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank reviewers for reviewing this paper and providing important feedback throughout its development.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

References

Deren, L.I.; Zhang, L.; Xia, G. Automatic analysis and mining of remote sensing big data. Acta Geod. Cartogr. Sin. 2014, 43, 1211–1216. [Google Scholar]
Ma, Y.; Wu, H.; Wang, L.; Huang, B.; Ranjan, R.; Zomaya, A.; Jie, W. Remote sensing big data computing: Challenges and opportunities. Future Gener. Comput. Syst. 2015, 51, 47–60. [Google Scholar] [CrossRef] [Green Version]
Chi, M.; Plaza, A.; Benediktsson, J.A.; Sun, Z.; Shen, J.; Zhu, Y. Big data for remote sensing: Challenges and opportunities. Proc. IEEE 2016, 104, 2207–2219. [Google Scholar] [CrossRef]
Zhu, Z.; Luo, Y.; Wei, H.; Li, Y.; Qi, G.; Mazur, N.; Li, Y. Atmospheric Light Estimation Based Remote Sensing Image Dehazing. Remote Sens. 2021, 13, 2432. [Google Scholar] [CrossRef]
Zhu, Z.; Luo, Y.; Qi, G.; Meng, J.; Li, Y.; Mazur, N. Remote sensing image defogging networks based on dual self-attention boost residual octave convolution. Remote Sens. 2021, 13, 3104. [Google Scholar] [CrossRef]
Zhou, W.; Newsam, S.; Li, C.; Shao, Z. PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J. Photogramm. Remote Sens. 2018, 145, 197–209. [Google Scholar] [CrossRef] [Green Version]
Sun, H.; Li, S.; Li, W.; Ming, Z.; Cai, S. Semantic-based retrieval of remote sensing images in a grid environment. IEEE Geosci. Remote Sens. Lett. 2005, 2, 440–444. [Google Scholar] [CrossRef]
Napoletano, P. Visual descriptors for content-based retrieval of remote-sensing images. Int. J. Remote Sens. 2018, 39, 1343–1376. [Google Scholar] [CrossRef] [Green Version]
Ye, F.; Xiao, H.; Zhao, X.; Dong, M.; Luo, W.; Min, W. Remote sensing image retrieval using convolutional neural network features and weighted distance. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1535–1539. [Google Scholar] [CrossRef]
Zhou, W.; Newsam, S.; Li, C.; Shao, Z. Learning low dimensional convolutional neural networks for high-resolution remote sensing image retrieval. Remote Sens. 2017, 9, 489. [Google Scholar] [CrossRef] [Green Version]
Imbriaco, R.; Sebastian, C.; Bondarev, E. Aggregated deep local features for remote sensing image retrieval. Remote Sens. 2019, 11, 493. [Google Scholar] [CrossRef] [Green Version]
Tong, X.Y.; Xia, G.S.; Hu, F.; Zhong, Y.; Datcu, M.; Zhang, L. Exploiting deep features for remote sensing image retrieval. A systematic investigation. IEEE Trans. Big Data 2019, 6, 507–521. [Google Scholar] [CrossRef] [Green Version]
Chaudhuri, B.; Demir, B.; Chaudhuri, S.; Bruzzone, L. Multilabel remote sensing image retrieval using a semisupervised graph-theoretic method. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1144–1158. [Google Scholar] [CrossRef]
Dai, O.E.; Demir, B.; Sankur, B.; Bruzzone, L. A novel system for content based retrieval of multi-label remote sensing images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 1744–1747. [Google Scholar]
Dai, O.E.; Demir, B.; Sankur, B.; Bruzzone, L. A novel system for content-based retrieval of single and multi-label high-dimensional remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2473–2490. [Google Scholar] [CrossRef] [Green Version]
Shao, Z.; Yang, K.; Zhou, W. Performance evaluation of single-label and multi-label remote sensing image retrieval using a dense labeling dataset. Remote Sens. 2018, 10, 964. [Google Scholar] [CrossRef] [Green Version]
Shao, Z.; Zhou, W.; Deng, X.; Zhang, M.; Cheng, Q. Multilabel remote sensing image retrieval based on fully convolutional network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 318–328. [Google Scholar] [CrossRef]
Kang, J.; Fernandez-Beltran, R.; Hong, D.; Chanussot, J.; Plaza, A. Graph relation network: Modeling relations between scenes for multilabel remote-sensing image classification and retrieval. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4355–4369. [Google Scholar] [CrossRef]
Wang, J.; Zhang, T.; Sebe, N.; Shen, H.T. A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 769–790. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Zou, Q.; Wang, Q.; Lin, Y.; Li, Q. Instance similarity deep hashing for multi-label image retrieval. arXiv 2018, arXiv:1803.02987. [Google Scholar]
Zhang, Z.; Zou, Q.; Lin, Y.; Chen, L.; Wang, S. Improved deep hashing with soft pairwise similarity for multi-label image retrieval. IEEE Trans. Multimed. 2019, 22, 540–553. [Google Scholar] [CrossRef] [Green Version]
Qiang, B.; Wang, P.; Guo, S.; Xu, Z.; Xie, W.; Chen, J.; Chen, X. Large-scale multi-label image retrieval using residual network with hash layer. In Proceedings of the International Conference on Advanced Computational Intelligence, Guilin, China, 7–9 June 2019; pp. 262–267. [Google Scholar]
Li, T.; Gao, S.; Xu, Y. Deep multi-similarity hashing for multi-label image retrieval. In Proceedings of the ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 2159–2162. [Google Scholar]
Zhu, H.; Long, M.; Wang, J.; Cao, Y. Deep hashing network for efficient similarity retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30, pp. 2415–2424. [Google Scholar]
Li, W.J.; Wang, S.; Kang, W.C. Feature learning based deep supervised hashing with pairwise labels. arXiv 2015, arXiv:1511.03855. [Google Scholar]
Li, F.; Dai, Q.; Xu, W.; Er, G. Multilabel neighborhood propagation for region-based image retrieval. IEEE Trans. Multimed. 2008, 10, 1592–1604. [Google Scholar] [CrossRef]
Nasierding, G.; Kouzani, A.Z. Empirical study of multi-label classification methods for image annotation and retrieval. In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, Sydney, Australia, 1–3 December 2010; pp. 617–622. [Google Scholar]
Li, R.; Zhang, Y.; Lu, Z.; Tian, Y. Technique of image retrieval based on multi-label image annotation. In Proceedings of the Second International Conference on Multimedia and Information Technology, Kaifeng, China, 24–25 April 2010; Volume 2, pp. 10–13. [Google Scholar]
Ranjan, V.; Rasiwasia, N.; Jawahar, C.V. Multi-label cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4094–4102. [Google Scholar]
Lai, H.; Yan, P.; Shu, X.; Wei, Y.; Yan, S. Instance-aware hashing for multi-label image retrieval. IEEE Trans. Image Process. 2016, 25, 2469–2479. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rodrigues, J.; Cristo, M.; Colonna, J.G. Deep hashing for multi-label image retrieval: A survey. Artif. Intell. Rev. 2020, 53, 5261–5307. [Google Scholar] [CrossRef]
Qi, X.; Zhu, P.; Wang, Y.; Zhang, L.; Peng, J.; Wu, M.; Chen, J.; Zhao, X.; Zeng, N.; Mathiopoulos, P.T. MLRSNet: A multi-label high spatial resolution remote sensing dataset for semantic scene understanding. ISPRS J. Photogramm. Remote Sens. 2020, 169, 337–350. [Google Scholar] [CrossRef]
Sumbul, G.; de Wall, A.; Kreuziger, T.; Marcelino, F.; Costa, H.; Benevides, P.; Markl, V. BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval. arXiv 2021, arXiv:2105.07921. [Google Scholar]
Sumbul, G.; Ravanbakhsh, M.; Demir, B. Informative and Representative Triplet Selection for Multi-Label Remote Sensing Image Retrieval. IEEE Trans. Geosci. Remote. Sens. 2021. [Google Scholar] [CrossRef]
Sumbul, G.; Demir, B. A Novel Graph-Theoretic Deep Representation Learning Method for Multi-Label Remote Sensing Image Retrieval. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 266–269. [Google Scholar]
Cakir, F.; He, K.; Bargal, S.A.; Sclaroff, S. Hashing with mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2424–2437. [Google Scholar] [CrossRef] [Green Version]
Shen, F.; Gao, X.; Liu, L.; Yang, Y.; Shen, H. Deep asymmetric pairwise hashing. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–37 October 2017; pp. 1522–1530. [Google Scholar]
Yang, H.; Lin, K.; Chen, C. Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 437–451. [Google Scholar] [CrossRef] [Green Version]
Zamir, A.R.; Wu, T.L.; Sun, L.; Shen, W.B.; Shi, B.E.; Malik, J.; Savarese, S. Feedback networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1308–1317. [Google Scholar]
Gong, Y.; Lazebnik, S.; Gordo, A.; Perronnin, F. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 2916–2929. [Google Scholar] [CrossRef] [Green Version]
Weiss, Y.; Torralba, A.; Fergus, R. Spectral hashing. In Proceedings of the NIPS, Vancouver, BC, Canada, 8–11 December 2008; Volume 1, pp. 4–12. [Google Scholar]
Liu, W.; Mu, C.; Kumar, S.; Chang, S.F. Discrete Graph Hashing. In Proceedings of the NIPS 2014, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Xia, R.; Pan, Y.; Lai, H.; Liu, C.; Yan, S. Supervised hashing for image retrieval via image representation learning. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014. [Google Scholar]
Liu, C.; Ma, J.; Tang, X.; Zhang, X.; Jiao, L. Adversarial hash-code learning for remote sensing image retrieval. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 4324–4327. [Google Scholar]
Roy, S.; Sangineto, E.; Demir, B.; Sebe, N. Metric-learning-based deep hashing network for content-based retrieval of remote sensing images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 226–230. [Google Scholar] [CrossRef] [Green Version]
Zhao, F.; Huang, Y.; Wang, L.; Tan, T. Deep semantic ranking based hashing for multi-label image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1556–1564. [Google Scholar]
Wu, D.; Lin, Z.; Li, B.; Liu, J.; Wang, W. Deep uniqueness-aware hashing for fine-grained multi-label image retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 1683–1687. [Google Scholar]

Figure 1. Examples of single-label images in UCMD and multi-label images in MLRSD. (a) Single Label: sparse residential. Multi-Label: sparse residential, irrigation bushes, trees, bare soil, buildings. (b) Single Label: sparse residential. Multi-Label: sparse residential, trees, buildings.

Figure 2. An overview of the DMHR model for multi-label remote sensing image retrieval. On one hand, batch images are input into the network to extract high-dimensional features, which are then fed into a hash layer to generate hash codes. On the other hand, the image similarity calculated through label vectors is used as the supervision information to train the model.

Figure 3. An example of dense labeling of the DLRSD dataset.

Figure 4. The Accuracy, Precision, Recall and F1 variation curves of each algorithm under different bits.

Figure 5. Experimental results of multi-label retrieval based on different models (dense residential).

Figure 6. Experimental results of multi-label retrieval based on different models (sparse residential).

Table 1. Parameters of deep hashing multi-label retrieval model.

	Filters	Size	Input	Output
Conv_1	64	11 × 11/4	227 × 227 × 3	56 × 56 × 64
Maxpool_1		3 × 3/2	56 × 56 × 64	27 × 27 × 64
Conv_2	192	5 × 5/1	27 × 27 × 64	27 × 27 × 192
Maxpool_2		3 × 3/2	27 × 27 × 192	13 × 13 × 192
Conv_3	384	3 × 3/1	13 × 13 × 192	13 × 13 × 384
Conv_4	256	3 × 3/1	13 × 13 × 384	113 × 13 × 256
Conv_5	256	3 × 3/1	13 × 13 × 256	13 × 13 × 256
Maxpool_5		3 × 3/2	13 × 13 × 256	6 × 6 × 256
Flatten			1 × 1 × 9216
Fc_6	4096		1 × 1 × 9216	1 × 1 × 4096
Fc_7	4096		1 × 1 × 4096	1 × 1 × 4096
Merge			[FC_6,FC_7]
Fc_8	q		1 × 1 × 8192	1 × 1 × q

Table 2. Annotation description of DLRSD dataset.

Label ID	Category	Number of Images	Label ID	Category	Number of Images	Label ID	Category	Number of Images
1	airplane	100	7	dock	100	13	sea	101
2	bare soil	754	8	field	103	14	ship	103
3	building	713	9	grass	977	15	tank	100
4	car	897	10	mobile home	102	16	tree	1021
5	chaparral	116	11	pavement	1331	17	water	208
6	court	105	12	sand	291

Table 3. Annotation description of UCMerced dataset.

Label ID	Category	Number of Images	Label ID	Category	Number of Images	Label ID	Category	Number of Images
1	airplane	100	7	dock	100	13	sea	100
2	bare soil	633	8	field	106	14	ship	102
3	building	696	9	grass	977	15	tank	100
4	car	884	10	mobile home	102	16	tree	1015
5	chaparral	119	11	pavement	1305	17	water	203
6	court	105	12	sand	389

Table 4. Retrieval performance of our deep hashing model under different values (48-bit hash code).

$β$	HL	Accuracy	Precision	Recall	F1
0	0.1044	0.6498	0.7769	0.7597	0.7482
0.1/q	0.090	0.6942	0.8059	0.8016	0.7866
0.5/q	0.0796	0.7269	0.8217	0.8337	0.8129
0.8/q	0.0792	0.7291	0.8196	0.8370	0.8136
1.0/q	0.0779	0.7354	0.8245	0.8452	0.8197
1.2/q	0.0782	0.7301	0.8212	0.8407	0.8151
5.0/q	0.0808	0.7262	0.8153	0.8340	0.8095

Table 5. Retrieval performance of our deep hashing model under different values (48-bit hash code).

$α$	HL	Accuracy	Precision	Recall	F1
0	0.0797	0.7294	0.8209	0.8394	0.8151
0.1	0.0795	0.7279	0.8218	0.8368	0.8134
0.5	0.0779	0.7354	0.8245	0.8452	0.8197
1.0	0.0787	0.7327	0.8216	0.8443	0.8171

Table 6. Retrieval performance of our deep hashing model under different values (48-bit hash code).

$γ$	HL	Accuracy	Precision	Recall	F1
0	0.0800	0.7286	0.8224	0.8376	0.8144
0.01	0.0782	0.7342	0.8249	0.8435	0.8183
0.1	0.0785	0.7329	0.8225	0.8415	0.8168
1.0	0.0779	0.7354	0.8245	0.8452	0.8197

Table 7. Retrieval performance of each algorithm when bits = 48 and 96.

	HL	Accuracy	Precision	Recall	F1
DHMR	0.0779	0.7354	0.8245	0.8452	0.8197	bits = 48
DHMR/fc	0.0779	0.7335	0.8238	0.8425	0.8182
IDHN	0.0809	0.7232	0.8193	0.8321	0.8096
Im-ISDH	0.1016	0.659	0.7811	0.7688	0.7546
DMSH	0.1161	0.6372	0.7423	0.7676	0.7388
DHN	0.093	0.693	0.798	0.8062	0.786
DPSH	0.0914	0.6955	0.7992	0.8086	0.7879
DHMR	0.0761	0.7413	0.8303	0.8505	0.8248	bits = 96
DHMR/fc	0.0756	0.7402	0.8303	0.8482	0.8246
IDHN	0.0762	0.7361	0.8259	0.8434	0.8201
Im-ISDH	0.1059	0.6467	0.7704	0.759	0.7449
DMSH	0.113	0.6489	0.7532	0.7789	0.7492
DHN	0.0866	0.7071	0.8093	0.8184	0.7981
DPSH	0.0861	0.7083	0.8117	0.8184	0.7991

Table 8. Retrieval performance of each algorithm of VGG16 (DLRSD).

	HL	Accuracy	Precision	Recall	F1
DHMR	0.0747	0.7421	0.8315	0.8496	0.8257	bits = 36
DHMR/fc	0.0750	0.7377	0.8291	0.8442	0.8216
IDHN	0.0780	0.7279	0.8313	0.8321	0.8140
DHMR	0.0725	0.7469	0.8374	0.8505	0.8290	bits = 48
DHMR/fc	0.0726	0.7465	0.8375	0.8497	0.8287
IDHN	0.0744	0.7360	0.8373	0.8365	0.8201
MSRCF	0.1051	0.7193	0.7996	0.8830	0.8150

Table 9. Retrieval performance of each algorithm of VGG16 (UCMerced).

	HL	Accuracy	Precision	Recall	F1
DHMR	0.0805	0.7253	0.8153	0.8372	0.8128	bits = 36
DHMR/fc	0.0807	0.7221	0.8232	0.8333	0.8135
IDHN	0.0857	0.5385	0.7568	0.7621	0.7355
DHMR	0.0778	0.7276	0.8278	0.8378	0.8139	bits = 48
DHMR/fc	0.0761	0.7269	0.8274	0.8369	0.8134
IDHN	0.0802	0.6350	0.7589	0.7647	0.7378
DAS-RHDIS	--	0.5680	0.6530	0.7000	0.6750

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cheng, Q.; Huang, H.; Ye, L.; Fu, P.; Gan, D.; Zhou, Y. A Semantic-Preserving Deep Hashing Model for Multi-Label Remote Sensing Image Retrieval. Remote Sens. 2021, 13, 4965. https://doi.org/10.3390/rs13244965

AMA Style

Cheng Q, Huang H, Ye L, Fu P, Gan D, Zhou Y. A Semantic-Preserving Deep Hashing Model for Multi-Label Remote Sensing Image Retrieval. Remote Sensing. 2021; 13(24):4965. https://doi.org/10.3390/rs13244965

Chicago/Turabian Style

Cheng, Qimin, Haiyan Huang, Lan Ye, Peng Fu, Deqiao Gan, and Yuzhuo Zhou. 2021. "A Semantic-Preserving Deep Hashing Model for Multi-Label Remote Sensing Image Retrieval" Remote Sensing 13, no. 24: 4965. https://doi.org/10.3390/rs13244965

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Semantic-Preserving Deep Hashing Model for Multi-Label Remote Sensing Image Retrieval

Abstract

1. Introduction

2. Related Work

2.1. Multi-Label Retrieval of Natural Images

2.2. Multi-Label Retrieval of Remote Sensing Images

2.3. Image Retrieval Based on Deep Hashing

3. A Semantic-Preserving Deep Hashing Model

3.1. The Basic Idea of Deep Hashing Based Multi-Label Image Retrieval

3.2. System Architecture of Deep Hashing Model

3.3. Design of Loss Function

4. Experiments and Analysis

4.1. Dataset and Evaluation Metrics

4.1.1. Dataset

4.1.2. Evaluation Metrics

4.2. Experimental Setup

4.3. Parameter Analysis

4.4. Experimental Results

4.4.1. Comparison with Baseline Models

4.4.2. Performance on Different Settings and Comparison with SOTA Model

4.4.3. Multi-Label Image Retrieval Instances

5. Conclusions and Prospects

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI