2.1. The Similar Pixel Selection Method in ESTARFM
The ESTARFM models assume that the difference in reflectance between low- and high-resolution images of the same phase is only caused by systematic errors, so it assumes the relation between reflectance of low- and high-resolution image is linear. However, in actual images, there are inevitable noise and the deviation of the geographic coordinate registration between the images acquired from different sensors. Therefore, the ESTARFM models select the similar pixels that are adjacent to the central pixel in moving window using the spectral threshold that is determined by the standard deviation of the reflectivity of the entire image and the estimated number of land cover type, as shown in Equation (1).
where F represents the reflectivity of high-resolution image, t is the image acquisition time, B is the band,
is the coordinate pair of the neighboring pixels,
is the central pixel coordinates,
σ(B) is the standard deviation of the reflectivity of the image in B band, and
m is the number of land cover classes. Pixels that satisfy the above relationship will be identified as spectrally similar pixels. In
Figure 2, the black polyline is the spectral curve of the central pixel, and all pixels whose spectral values within the red polyline are selected as similar pixels. A correct selection of spectrally similar pixels is of great significant for the fusion process, which ensures the accuracy of spectral information for predicted central pixel [
5].
Due to the existence of the phenomenon that the same land objects may have different spectral characteristics in remote sensing images [
16], selecting similar pixels only based on the spectral threshold will cause a certain degree of error. Further, mixed pixels in remote sensing images, especially in heterogeneous areas, tend to cause bigger uncertainties in the selection of similar pixels based on the mixed pixel spectral reflectance [
17]. As illustrated in
Figure 3, this simulated image contains three areas of different types of land cover, that is, forest, crop in growing season, and bare soil (the spectra is derived from actual Landsat OLI image). The cross areas are mixed pixel with 50 percent of each neighboring land cover type. The spectra of each pure and mixed pixel are shown in
Figure 4 and
Figure 5. The simulated image shows that the spectral threshold cannot separate the mixed pixels of different land cover type in moving window. As shown in
Figure 5, the central pixel is a mixed pixel consists of forest and bare soil, while the mixed pixels comprised crop and bare soil within the moving window and can be selected as spectral similar pixels with the central mixed pixel according to the spectral threshold (spectral threshold is calculated based on the simulated image). If we calculate the threshold using the complete remote sensing image as ESTARFM does, the threshold would be larger and cause more wrong selection.
2.2. Improved Selection of Similar Pixels
In view of the aforementioned shortcomings of the spectral threshold for the selection of similar pixels, we proposed an approach for selecting the similar pixels using spectral mixture analysis, and the research aims to improve the ESTARFM and reduce the level of uncertainty in image fusion.
The basic assumption of spectral mixture analysis is that the land surface is composed of a few features (i.e., endmember) whose spectral features are stable [
18]. Each pixel can be represented as its endmember spectrum and its proportional fraction in pixels. By spectral mixture analysis, the spatial information can be obtained at the sub-pixel level so that the pixel components can be identified more accurately.
The endmember fraction of each mixed pixel is obtained by using constrained least squares solution model (Equation (2)).
where
is the endmember fraction value,
is the end element reflectance,
is the residual value, and N is the number of endmember. The endmember fraction value satisfies the following constraints.
In previous studies [
19,
20], the commonly used spectral mixture analysis model is a fixed endmember mixture analysis model, that is, each type of ground object uses the same endmember spectrum, ignoring the phenomenon that the same object may have different spectra, so it is limited. The multiple endmember spectral mixture analysis (MESMA), proposed by Roberts et al., is a linear unmixing model [
21], which employs the variable endmember spectra and uses the endmember judgment rule to select a mixture model for each pixel.
The MESMA model is used to decompose mixed pixel of Landsat OLI image. Firstly, based on Vegetation (V)-Impervious surface (I)-Soil (S) (V-I-S) model [
22], the vegetation, impervious surface, and bare soil are selected as the basic endmember, among which the impervious surfaces are anthropogenic features, such as rooftops, roads, driveways, sidewalks, and so on. Secondly, the original endmember spectral library is obtained using the pure pixel index (PPI) and image scatter plot [
23]. Thirdly, the values of three indexes—Count-Based Index (CoBI), endmember Average RMSE (EAR), and Minimum Average Spectral Angle (MASA)—are calculated. Finally, according to the rules of the maximum CoBI and minimum EAR and MASA, the spectral curves of each endmember are selected from the images, and the vegetation, impervious surface, and bare soil spectral library are established.
CoBI determines the number of spectra modeled by an endmember within the endmember’s class (in_CoB) and outside of the endmember’s class (out_CoB).
n is the number of endmember models.
where
i is the serial number of an endmember and
j is the modeled spectrum; the spectral angle is expressed as follows
where
is the reflectance of an endmember,
is the reflectance of a modeled spectrum,
is the length of the endmember vector and
is the length of the modeled spectrum vector.
In MESMA, the unmixing process is based on the Equation (2) and Equation (3) as well, while a root mean square error (RMSE) (Equation (8)) is employed as an evaluation index for pixel decomposition:
where
is the fitted residual of the k band, and λ is the total number of spectral bands. For each pixel, the inversion of different endmember combinations is performed and the result with smallest RMSE value is selected as the final result. Thus, each pixel has its corresponding endmember mixture model which is the combination of the most suitable endmember. For instance, in
Figure 3, the two kinds of mixed land are classified as different endmember model (vegetation (crop) and bare soil, vegetation (forest) and bare soil). Compared with the fixed endmember mixture analysis, this method can better recognize the phenomenon that different spectra characteristics with the same object in the actual image and obtain accurate estimated fraction value. The MESMA can effectively solve the issues that the same object may have different spectrum [
24].
For implementation process of the I-ESTARFM, firstly, the quantitative information of the endmember mixture model and the fraction value of high-resolution image (Landsat image) are obtained based on the spectral mixture analysis by using MESMA. Then, the endmember type structure and the fraction value of the mixed pixel are used as the basis for searching the neighboring spectrally similar pixels in moving window.
The similar pixels are selected preliminarily according to the endmember mixture model of the central mixed pixel in moving window. All the pixels in the moving window that have the same endmember mixture model with the center pixel are initially selected as the similar pixels. The mixture model consists of different type of endmember derived from V-I-S model, and the most suitable spectrum is selected from each kind of land object (V-I-S); meanwhile, at most one spectrum of a kind of land object is selected for single model. The similar pixels are further identified based on the endmember fraction values, the fraction standard deviation of the whole image and the number of endmember (
Figure 6, Equations (9) and (10)).
where D is the end-member type, f is the image end-member fraction value,
is the coordinate pair of the neighboring spectrally similar pixels,
is the central pixel coordinates, σ(D) is the fraction standard deviation of the whole image, and
k is the endmember number.
Figure 7 illustrates the different results of similar pixel selection by the spectral threshold method and the improved method. The crop land in the left lower corner is a wrong selection and when the predicted time is not in the crop’s growing season, the fusion image would have obvious bias. In contrast, the improved method can eliminate the wrong similar pixel in the first step (Equation (9)) since the different mixed pixels are allocated different endmember mixed model, which can improve the fusion result’s accuracy.