1. Introduction
Within the continuously advancing domain of hyperspectral image (HSI) technology, imaging spectrometers have undergone significant evolution and are currently capable of acquiring a multitude of spectral bands, in the order of hundreds, directly from the Earth’s surface [
1,
2,
3,
4]. This technological progression has allowed for HSIs that can describe the chemical constituents of terrestrial features with high precision. These images play a vital role in providing diagnostic spectral insights that can discriminate between distinct objects [
5], rendering them invaluable for detection purposes [
6,
7,
8,
9,
10,
11]. However, owing to the spectral diversity intrinsic to natural substances and the limited spatial resolution of spectral sensors, small-sized objects may only occupy a fraction of an individual pixel, becoming enmeshed within the background [
12,
13]. Under such circumstances, the spectral signature of a mixed pixel reflects the absorption features corresponding to multiple endmembers, potentially deviating from those of the pure objects. When the focus is narrowed to a singular endmember within a mixed pixel, the challenge presented shifts to the realm of subpixel object detection [
14,
15]. This challenge cannot be addressed using spatial information or simple spectral matching.
The cornerstone of object detection is the effective differentiation of objects from their surrounding environment [
16,
17]. Methods aiming at subpixel object detection can be classified based on the employed background model construction techniques. One type of method describes the background using statistical distribution and is named using unstructured background detection methods. Examples of such methods are the matched filter (MF) [
18]; the adaptive coherence estimator (ACE) [
19]; the adaptive matched filter (AMF) [
20]; the subpixel spectral matched filter (SPSMF) [
21], an advancement derived from the AMF; and the local segmented adaptive cosine estimator (SACE) [
22]. Furthermore, structured background detectors leverage subspace models to articulate background changes. This category comprises the orthogonal subspace projection (OSP) [
23], the adaptive subspace detector (ASD) [
24], and the kernel-based variant of the OSP detector [
25]. Presently, a widely used modality for detection is based on linear mixed models. The underlying mathematical architecture bears a resemblance to that of subspace models; however, it is distinguished by the endmember spectra embedded within the observational data, serving as the representative of the object information intrinsic to the image data. Notable methodologies include the hybrid subpixel detector by Broadwater et al. [
26], which combines the fully constrained least-squares algorithm (FCLS) with the adaptive matched subspace detector (AMSD) and ACE for detection; additionally, the hybrid selective endmember unstructured detector (HSEUD) and the hybrid selective endmember structured detector (HSESD) have been introduced by Du et al. [
27].
Sparse representation theory has been widely used in image processing and has been introduced in hyperspectral subpixel feature detection. Based on the sparsity of hyperspectral data, a background model was constructed. Subsequently, this model was integrated with a binary hypothesis testing algorithm to facilitate the extraction of object detection outcomes. Representative algorithms involve supervised object detection using combined sparse and collaborative representation (CSCR) [
28], subpixel detection based on background joint sparse representation detection (BJSRD) [
29], dictionary reconstruction-based subpixel object detection [
30], and hyperspectral subpixel pixel reconstructed detection (HSPRD) [
31].
Although the aforementioned algorithms utilize the sparsity of hyperspectral data for background reconstruction, they require a preset dictionary size for acquiring the background sparse reconstruction dictionary and cannot adapt according to the hyperspectral image data being processed. Thus, the algorithms have poor generalization, and it is difficult to quickly migrate and use hyperspectral data from different scenarios and sources.
To address this problem, mixed pixels are decomposed using spectral dictionary learning to improve the accuracy of hyperspectral image background reconstruction, and to obtain background endmembers while calculating the corresponding abundance coefficients. Additionally, in this study, an adaptive scale constraint is introduced for the purpose of enhancing the adaptability of background endmember extraction to obtain better detection results from hyperspectral images without adjusting algorithm parameters, as well as to improve the generalization and practicality of the algorithm.
The main contributions of the proposed background endmember extraction approach are presented as follows:
An adaptive scale constraint is introduced into the process of background endmember dictionary learning to achieve the simultaneous extraction of the background spectral endmembers and their quantities.
A novel background endmember extraction method based on adaptive background endmember dictionary learning is proposed to improve the detection abilities of pixel reconstruction-based subpixel detectors for small objects.
The structure of the remaining sections is shown below.
Section 2 introduces the principles of some widely used subpixel detection approaches. A detailed analysis of the subpixel object detection model is presented in
Section 3. Then,
Section 4 describes the proposal of a method predicated on adaptive dictionary learning for the extraction of background endmembers. In
Section 5, a subpixel object detector based on background reconstruction is introduced. The experimental validation of the proffered approach is systematically conducted in
Section 6, and a comprehensive summary of the entire study is provided in
Section 7.
4. Adaptive Dictionary Learning-Based Background Endmember Extraction
The dictionary learning and sparse representation theory suggests that the background endmember matrix described in Equation (14) can be considered as a dictionary matrix discerned from the imagery under detection. Every individual dictionary atom represents a background endmember spectrum, which is taken as a component of a mixed pixel. Furthermore, the associated abundance coefficient vector is a sparse encoding. As a result, the background endmember matrix is obtained via the application of dictionary learning methodologies. After the retrieval of background endmembers, the abundance matrix is resolved accordingly.
Grounded in the conceptual framework presented above, an innovative hyperspectral image unmixing technique rooted in online dictionary learning strategies was introduced in [
32]. This technique adheres to the traditional dictionary learning paradigm, wherein the background endmember dictionary is systematically derived by optimizing an objective function configured in the least absolute shrinkage and selection operator (LASSO) framework as follows:
where
represents a regularization parameter,
denotes the spectral vector of the
ith pixel in the HSI matrix
, and
signifies the total number of pixels encompassed within the hyperspectral dataset subject to detection.
In Equation (15), the quantity of atoms constituting the background spectrum dictionary (i.e., the total number of background endmember spectra) emerges as a free variable. It is critical that this parameter be predetermined and supplied as a known input prior to the initiation of dictionary learning. Since hyperspectral image data are usually characterized by different numbers of endmembers, it is necessary to estimate the number of endmembers or to set it based on empirical values before detecting different hyperspectral images. In this case, the problem of inaccurate estimation concerning the number of endmember spectra appears, and estimation errors can affect the generalized performance of the algorithm.
Therefore, in this paper, a dictionary size-adaptive method is considered to achieve the simultaneous acquisition of background endmember spectra and their numbers. A dictionary size penalty term is added on the basis of Equation (15), which may be described as a generally used row sparsity specification in signal processing [
33]. Moreover, in addressing practical applications, for any pixel
of the hyperspectral dataset, the background endmember dictionary
and the corresponding abundance coefficient
should be constrained to non-negative values.
To sum up, in the adaptive endmember extraction methodology described herein, the matrix of background spectral endmembers (i.e., the dictionary matrix) is obtained by optimizing the following objective function:
where
is the background endmember indicator vector (BEIV) and where
is the abundance coefficient value of the
jth band in the
ith pixel abundance coefficient vector
. The background spectrum endmember indication vector may assess the significance of the background spectrum endmembers obtained during learning using the zero-element counting method. Here, it should be mentioned that the last term in Equation (16) is employed, where
is the dictionary-scale penalty term that constrains the number of background spectrum endmembers, and
is a balance parameter. The indicator function is defined as
The non-zero vector output is 1; therefore, the sum of all background spectrum endmember indicator functions, i.e., the dictionary-scale penalty term , can represent the actual number of background spectrum dictionary atoms used.
Since the objective Function (16) contains a multivariate indicator term
, the multivariate morrow proximity index (MMPI) penalty term used in the optimization and solution may be defined as
The equation, when
is large enough, approximates the multivariate indicator
. When
,
can be considered almost the same as the multivariate indicator
. Therefore, the MMPI penalty in Equation (18) can be used as an approximation of the multivariate indicator term
, and the optimal objective Function (16) can be rewritten as
In addressing the joint optimization problem involving the dictionary, , and the sparse solution, , as explicated in Equation (19), it is advantageous to take an approach that entails the strategic alternation of fixation between and . This method facilitates the iterative computation for the attainment of the minimal value in one variable when the value of the other variable is held constant.
When
is fixed, the objective Function (19) is transformed into a dictionary update problem, which can be expressed as
Using a gradient descent algorithm [
34] to iteratively update and solve optimization Problem (20),
is finally obtained.
When
is fixed, the objective Function (16) is transformed into a dictionary update problem for
. After substituting Equation (18) into Equation (19), it can be expressed as
Similar to , is the corresponding row vector of matrix . Here, represents a column vector, and there is . In the optimization framework of Equation (21), the variables and are considered. The resolution of this problem can be facilitated by splitting it into two distinct subproblems, each admitting a solution in closed form.
For
, since the abundance coefficients between pixels are independent, optimization Problem (21) is equivalent to
which can be effectively addressed utilizing an iterative shrinkage threshold method.
For
, due to the pair index
, optimization Problem (21) is equivalent to
which constitutes a conventional mixed-norm penalty indicative of an MMPI framework, amenable to direct analytical resolution [
35].
In summary, an adaptive dictionary learning approach has been employed to extract background endmembers, described in Algorithm 1 as pseudocode.
Algorithm 1 Adaptive dictionary learning-based background endmember extraction (ADLBEE) |
Input: Original HSI , regularization parameters and |
Number of iterations |
Output: background endmember matrix |
1 Initialization: Set initial values , The initial background endmember matrix is generated randomly |
2 Repeat cycle |
3 |
4 Repeat cycle |
5 Solve Equation (22) to obtain at |
6 Substitute into Equation (23) to obtain |
7 |
8 Until the objective Function (21) converges |
9 |
10 Substitute into Equation (20) to obtain by gradient descent method |
11 |
12 Until converges or |
13 |
14 Return background endmembers |
Subsequent to the isolation of the background endmember matrix, the pertinent abundance coefficient matrix becomes solvable. Given the distributional traits of objects within real-world scenarios, it is acknowledged that while the hyperspectral image may encompass numerous object endmembers, individual mixed pixels typically constitute a limited number of endmembers—commonly two or three. In this study, it is hypothesized that the abundance coefficient vector , associated with each mixed pixel, exhibits sparsity. In this study, it is held that the sparse abundance coefficient matrix can be estimated by employing a sparse unmixing approach.
Furthermore, in recognition of the spatial coherence inherent among mixed pixels and their proximate counterparts, it is asserted that abundance values corresponding to identical endmembers should exhibit gradual transitions [
36]. Hence, a total-variation (TV) regularization term is incorporated during the unmixing process to promote spatial continuity. Consequently, the resolution of the abundance coefficient matrix is formulated as an optimization predicament minimized by the following expression:
where
is the vectorial expansion of each heterogeneous TV that enforces the gradual transition of abundance values corresponding to identical endmember types across contiguous pixels.
The optimization issue described in Equation (24) can be conceptualized as a constrained basis pursuit denoising (CBPDN) problem, which integrates proximate spatial information. This problem can be solved through an algorithm employing the TV-variable splitting, and the augmented Lagrangian approach for spectral unmixing, ultimately leading to the estimation of the abundance coefficient matrix .
5. Subpixel Object Detection with Pixel Reconstruction Detection Operator
With the derived background endmember matrix for the object HSI and the known spectral signatures of the objects of interest, , the composite endmember matrix encompassing all object spectra within the scene can be denoted as . For the composite endmember matrix , the associated abundance coefficient matrix is retrievable via the inverse solution employing the non-negative least-squares (NNLS) algorithm.
Given that each pixel in the hyperspectral dataset under investigation adheres to the linear mixing model (LMM), it is possible to reconstruct the background pixels utilizing the endmember matrix
in conjunction with the abundance vectors
, resulting in the following representation:
In the case of pixels that encompass the spectral signatures of terrestrial objects for detection, the reconstruction of the pixel spectra for the detected terrestrial objects, utilizing the endmember matrix
and the corresponding abundance coefficient vector
, yields enhanced accuracy with reduced error margins:
Therefore, subpixel objects on hyperspectral images are detected using the background reconstruction detection operator [
31],
where
and
denote scaling coefficients, while
represents the error associated with reconstruction by utilizing the endmember matrix
, which comprises the spectra of the features for detection, in conjunction with the corresponding abundance coefficient
. Analogously,
denotes the reconstruction error arising from the utilization of the endmember matrix
, which exclusively consists of the spectrum of the background objects, in conjunction with the corresponding abundance coefficient
.
For the sake of brevity, we denote
When
represents a background pixel devoid of the spectrum pertinent to the object under detection, the reconstruction error yielded through the application of the aforementioned methods should fulfill the following condition:
At this time, by stretching the scale coefficient
, the current pixel detection value can approach zero infinitely, i.e.,
Therefore, it is shown that when the scale coefficient is larger, the coordinate of the value calculated using the detection operator is closer to the -axis. Then, since , the value of the scale coefficient has little impact and can be neglected. Therefore, when the current detection pixel is a background pixel, the calculated value of the detection operator is approximately zero.
When
is the pixel of the object to be inspected, whose spectrum it contains, the reconstruction error yielded through the application of the aforementioned methods should satisfy the following condition:
This is attributable to the fact that when the detection pixel, denoted as , incorporates the spectrum of the object under investigation, relying exclusively on the background endmember matrix for reconstruction tends to result in significant errors. In contrast, when the superset that includes endmember matrix of the object to be detected is used, then the reconstruction is more accurate, and errors are very small.
Concurrently, through the scaling of the coefficient
, the value of the current detected pixel can approach 1 infinitely, i.e.,
It is shown with a larger , the coordinate calculated using the detection operator is closer to the y-axis. Since , the value of has very little impact and can be neglected. Therefore, when the current detection pixel contains the spectrum of the ground object to be detected, the value calculated using the detection operator is approximately 1.
In summary, by calculating all pixels in the hyperspectral image with the pixel reconstruction detector operator , background pixels can be suppressed, while object pixels to be detected are enhanced and the subpixel ground object detection results are obtained.