1. Introduction
Sea ice is the cause of most marine disasters in polar and high latitude regions. As a component of the global marine and atmospheric system, sea ice with high albedo impacts marine industries, and matter and momentum exchanges between the atmosphere and the ocean. In addition, sea ice plays a key role in the balance of radiation, energy, and mass on ocean surfaces [
1,
2]. Therefore, studying changes in sea ice is highly important. However, because of the harsh natural environment in areas covered with sea ice, conventional observation methods such as the in situ sampling method and the visual estimate method [
3] cannot acquire detailed information on sea ice changes in a timely and effective manner. Instead, remote sensing technology, which can analyze data for large areas of sea ice rapidly and extensively, is widely used in sea ice detection. The data sources commonly used include the National Oceanic and Atmospheric Administration (NOAA)/advanced very-high-resolution radiometer (AVHRR) [
4], and AQUA and TERRA satellite/moderate-resolution imaging spectroradiometer (MODIS) [
5] images, etc. Due to the limitations of spectral resolution, these sea ice detection methods primarily focus on the extraction of sea ice extent. It is difficult to acquire further details about the sea ice, such as class. Compared with the multispectral technologies, hyperspectral remote sensing technology can acquire nearly continuous spectrum information and rich sea ice image information, providing an important resource for sea ice detection. Widely used sea ice detection methods include the threshold method, unsupervised classification and supervised classification [
6,
7] such as in the K-means and support vector machine (SVM) methods.
A hyperspectral data set contains hundreds of spectral bands characterized by large amounts of data and narrow bands (high spectrum bandwidth is usually less than 10 nm) [
8]. Different types of sea ice in band ranges such as those from 400 to 610 nm exhibit remarkable reflectivity differences and strong separability [
9]. Applied to sea ice detection, the abundant spatial and spectral information of the hyperspectral data offers huge potential; however, the accompanying surge in data volume introduces large challenges to data processing and interpretation methods. Also, the high correlation and redundant information in the bands can significantly reduce the precision achievable by the traditional image classification methods [
10]. A data dimensionality reduction can reduce the amount of processing required on the original data under the premise that it reserves the important information. As a result, the dimensionality reduction of hyperspectral sea ice data is of great significance for subsequent processing.
The existing dimensionality reduction methods are primarily divided into two categories [
11]: methods based on feature extraction, such as principal component analysis (PCA) [
12], and methods based on band selection, such as entropy [
13]. The methods based on feature extraction change the physical meaning of the original band during the feature-extraction process; consequently, they cause the loss of information to a certain extent. In contrast, the band selection-based methods retain the original band characteristics without affecting their physical meaning; thus, they are widely used in hyperspectral data dimensionality reduction. Numerous band selection methods have been proposed recently. Based on whether they require prior knowledge, these band selection methods can be divided into two major types: supervised and unsupervised. Unsupervised band selection methods do not require prior knowledge about ground objects; therefore, this method is widely applied to remote sensing image processing.
The basic idea of unsupervised band selection methods is to calculate the amount of information and the similarity between different bands to identify the most distinctive and informative bands [
14]. Of the methods based on information sorting, such as entropy and adaptive band selection (ABS), entropy was used to detect redundancy between bands in [
13], while ABS was investigated in [
15]. Of the methods based on minimum error band refactoring such as linear prediction (LP) and first spectral derivative (FSD) [
16], two modified LP band selection methods were proposed for unsupervised band selection in [
17]. However, these methods have some shortcomings. For instance, the method based on entropy takes into consideration only the amount of information in the band itself, ignoring the correlation between bands, resulting in a non-optimal subset of selected bands. In contrast, the LP-based methods primarily consider the similarity between bands without fully considering the information in the bands themselves. To a certain extent, this approach affects the final classification accuracy.
To choose the optimal band combination that features both a large amount of information and a low-degree of inter-band correlation, this paper proposes an improved similarity measurement method based on LP (ISMLP) for hyperspectral sea ice detection. Firstly, we chose the first band with maximum amount of information based on mutual information (
MI) theory [
18], and then determined the second band with the least similarity by the spectral correlation measure (SCM) method [
19], finally, we chose the subsequent band through the LP method, and used a support vector machine model for sea ice classification. The experimental analysis is carried out on Baffin Bay data and Bohai Bay data.
The paper is organized into four sections. In the next section, the improved similarity measurement method is proposed, the general framework of sea ice detection based on ISMLP method is presented, and the major issues about method implementation are investigated, such as pixel selection, band selection method and the sea ice classification model.
Section 3 illustrates the experimental analysis. Finally,
Section 4 presents the main contributions and draws the conclusions of this paper.
2. Improved Similarity Measurement Method
The purpose of band selection is to choose the optimal band combination from the available bands that contain a large amount of information but low band similarity. In this paper, band selection was achieved by using LP to perform a similarity comparison. Because LP is sensitive to the original bands, choosing the original bands with a large amount of information and low similarity is crucial. As the first step, the
MI theory, which performs well in terms of information measurement, was used to determine the first original band containing a large amount of information. High-spatial-resolution Landsat images were used as a benchmark. Then, in the second step, the SCM method was utilized to obtain the second original band because the method has demonstrated good performance in terms of spectrum recognition. Finally, subsequent bands were obtained using the LP method. Then, the SVM classifier model [
20] was applied to classify the sea ice using the selected bands. Data preprocessing was required during the method implementation to determine the band range with good separability for sea ice, to remove poor bands and so on. In addition, we also analyzed the influence of different proportions of pixel selection on the process of band selection, and studied the number of selected bands required after dimensionality reduction. We compared this approach with traditional band selection methods such as entropy, LP, and ABS. The results of these experiments indicate that, overall, the proposed method achieves a better classification performance than do other traditional methods, which indicates that this method can be effectively applied in hyperspectral sea ice detection.
Figure 1 shows the entire framework for detecting sea ice based on the proposed ISMLP method. Four major issues were investigated: (1) original band selection, (2) subsequent band selection, (3) hyperspectral sea ice image classification, and (4) classification accuracy evaluation. The methods are discussed in detail in the following sections.
2.1. Pixel Selection
Due to the large number of pixels in a remote sensing image, the sizes of the matrices in the data process are large, which reduces efficiency. However, using only a relatively small subset of pixels in the band selection process does not change the results in most cases [
14]. This is because a high spatial correlation exists among bands of hyperspectral data. Therefore, this study made a comparative analysis of different pixel-selection proportions.
Number of selected pixels: First select a band with all pixels, followed by a comparative analysis using all pixels, 1/100 pixels, 1/1000 pixels, 1/10,000 pixels, etc.
Locations of the selected pixels: When using a low proportion to randomly select pixels for an entire image, not all categories of pixels can always be included in all cases. To eliminate this effect, this investigation adopted a pixel-selection method using K-means clustering. The specific steps are as follows:
- 1
Select all the bands (after removing the invalid bands without radiometric calibration), perform the K-means clustering classification, and then merge the same categories.
- 2
Calculate the number and locations of different categories of pixels, and determine the number of pixels in each category.
- 3
According to step 2, choose the corresponding pixels in each type by uniform randomness.
In the experiments, the ISMLP method was employed to perform the comparative analysis using different pixel-selection proportions. Through experimental analysis, this study found that the pixel-selection method based on K-means clustering [
21] selects corresponding pixels based on their proportions in different categories. This significantly improves the computational efficiency; consequently, it can achieve similar or higher classification accuracies than using the original data.
2.2. Band Selection Method
2.2.1. Original Band Selection Method
The Landsat data, which has high spatial resolution at the same times and the same locations, was adopted as the base band; then, the band with the highest similarity to the base band was chosen by
MI as the first initial band.
MI is a basic concept of information theory. It describes the statistical correlation between two random variables, that is, a variable exists in the information of another variable [
22]. Assuming that the higher resolution benchmark image is taken as the base band
B0, the
MI between any band
A and
B0 can be expressed as follows:
where
p(
A) and
p(
B0) are the respective probability distributions of
A and
B0, and
p(
A,
B0) is the joint probability distribution between them.
MI can be used as a similarity measure criterion: the greater the
MI is, the more information from one band exists in another band. The similarity between the two bands will be higher too. High similarity to the benchmark band indicates that the band contains a greater amount of information. Therefore, the band with the maximum
MI value can be selected as the first original band.
To choose the second original band that is the least similar to the first original band, the SCM method is used to calculate the cross-correlation between two bands. The SCM method can be described as follows:
Assuming that
B1 and
B2 are two bands from the set
Φ,
and
, and
L is the number of pixels, then the cross-correlation between two bands is:
where
and
are the averages of bands
B1 and
B2, respectively. The greater the value is, the higher the similarity between the two bands is. Then, we choose the least-similar band as the second original band.
This method takes both the band information and the similarity between the bands into consideration when choosing the initial band pair. Thus, it chooses a more optimized initial band pair, and provides a better method for subsequent band selection.
2.2.2. Subsequent Band Selection Method
The basic steps of the subsequent band selection method are as follows [
23]:
Initialize the algorithm with the pair B1 and B2, and construct a band subset Φ = {B1, B2}.
Find B3, which is the band least similar to B1 and B2, and then update the selected band subset to .
Repeat until the subset Φ contains a sufficient number of bands.
In subsequent band selection, LP [
16] is used to calculate the similarity between a single band and multi-bands. LP is defined as follows:
Assuming that bands
B1 and
B2 are the two bands in the subset
Φ,
B1 and
B2 can be used to calculate band
B, which is the least similar to bands
B1 and
B2:
In Equation (3),
B′ is the result of LP estimation of
B using
B1 and
B2, while
a0,
a1, and
a2 are the parameters of the minimized LP error; the error is
, and the parameter vector
can be determined by the least squares solution:
In Equation (4), X is the matrix L × 3, the first column is 1, the second column contains all the chosen pixels of band B1, the third column contains all the pixels of band B2, and y is the vector L × 1 that includes all the pixels in band B. When we find the band with the maximum error, it is considered as the band most dissimilar to B1 and B2. Therefore, it can be used as B3 and added into the subset Φ. The method is carried out repeatedly until the number of selected bands in the subset Φ meets the requirements for band selection.
2.2.3. Number of Selected Bands
In practical applications, it is difficult to determine the required number of bands. Generally, when an image scene is complicated, such as one that contains numerous categories, it is necessary to choose more bands because the data dimensions must be sufficiently high to accommodate the categories used in detection and classification. The minimum number of hyperspectral bands can be estimated using the virtual dimension (VD) [
24] method.
Under normal circumstances, the noise subspace projection (NSP) obtains the largest estimate in the VD method. Therefore, this method is analyzed in this paper. The NSP estimates the noise matrix. Many methods exist for noise covariance matrix estimation, such as the residual analysis method developed by Roger [
25] which is particularly suited for use with hyperspectral image data. Therefore, we apply the estimation method based on the residual errors.
2.3. Sea Ice Classification Model
The traditional threshold segmentation method is not efficient for addressing high-dimensional data, and it is difficult to obtain the optimal threshold [
26]. Because SVM shows outstanding performance in solving small-sample, high-dimension classification problems, we selected SVM as the benchmark classifier to classify sea ice [
20].
Assuming that a training sample set
T consists of
N independent samples, denoted as
,
xi represents the training samples and
yi ∈ {−1, +1} represents the associated labels. The basic idea of SVM is to map the data of the original sample space into a high-dimension feature space introducing a kernel function to find an optimal classification hyperplane that maximizes the margin between the two classes [
20].
The classification problem can be transformed into a typical convex programming problem based on the Kuhn–Tucker theorem. Similarly, the convex programming problem can be transformed into the following dual programming problem through the Lagrange multipliers
, which are associated with the original training patterns
xi:
The dual program problem has an optimal global solution. The Lagrange multipliers
corresponding to the non-support vectors are zero; therefore, the optimal classification decision function for binary classification can be obtained by solving the above problems:
where
SV is the support vectors set,
and
b are the parameters used to determine the optimal classification hyperplane, and
K(∙,∙) is the kernel function. We chose the radial basis kernel function, which yields a better classification result [
20]. The optimal partition hyperplane problem can be settled by solving a dual optimization problem, according to the convex theory. The obtained solution is an optimal global solution.
The SVM is simply one kind of binary classification classifier. Multiclass classification problems rely on transforming binary classification into multiple binary classification problems to be solved. In view of hyperspectral sea ice data characteristics such as high dimensionality, multiple features, and large amounts of data, we use the one-against-one (OAO) method [
20] to solve the multiclass problem. In the OAO method, given
k classes in the data, we construct a classifier between every two categories (e.g.,
i and
j). Consequently, we must construct
binary classifiers altogether, and each binary classifier will determine whether the samples belong to category
i or category
j. Corresponding to each decision function of the binary classification problem, the final decision in the OAO strategy is made on a “winner-takes-all” based by building the following maximization:
where arg is an indicator function, and argmax () is a subscript value that maximizes the value in the parentheses. Namely, the indicator
i is chosen as the
M(
x) value, and
i is also the subscript of
f(
x), which obtains the maximum value in the
k-decision function. The
i category is the corresponding type to which the sample points
x should belong.
4. Conclusions
The goal of this work was to improve the performance of hyperspectral sea ice detection. In this paper, we proposed an improved similarity measurement method for dimensionality reduction of sea ice hyperspectral images to obtain the optimal band combination, which contains both a large amount of information and exhibits low similarity. Our proposed method is compared with the LP, entropy and ABS methods with regard to classification accuracy in two experiments. According to the experimental results, overall, the ISMLP method performs better in detecting sea ice than the traditional methods. From analyzing the experiment results, we can summarize as follows:
Considering each band’s information and the similarity between bands, the proposed ISMLP method produces the best classification accuracy compared to the traditional methods while also greatly reducing the data dimensions.
Considering the spectral characteristics of sea ice, this work chose bands that have good spectral characteristics and separability and applied the band selection of the hyperspectral image to only those bands. This approach effectively reduces the scope of the original bands and enhances the efficiency of the method.
Considering the high spatial correlations in the hyperspectral images, we selected pixels based on K-means clustering and analyzed the changes in the classification accuracy resulting from pixel selection at different proportions. The results revealed that selecting a proportion of approximately 1/100 pixels with K-means maintains the balance between efficiency and performance. That is, the 1/100 pixel-selection proportion reduced the computational cost while simultaneously achieving higher classification accuracy than other pixel-selection proportions.
According to the results of the experiments, our method obtains superior performance with fewer dimensions and higher efficiency. It should be noted that the influence of snow coverage was not considered in this paper; in future studies, we plan to combine our method with an image texture analysis method to eliminate the influence of weather conditions.