1. Introduction
Hyperspectral imaging (HSI) stands as a versatile technology amalgamating imaging and spectroscopy to concurrently capture both spatial and spectral facets of targets. The resulting data are organized into a three-dimensional cube, comprising two spatial dimensions and a single spectral dimension, collectively forming a hypercube [
1]. In the realm of hyperspectral remote sensing, this capability spans multi-band imaging across the visible and infrared spectra, enabling analyses at the molecular and even atomic scales. This surpasses the confines of traditional optical remote sensing, which is reliant solely on spectral data. Recently, hyperspectral remote sensing has showcased its prowess across varied domains, such as environmental monitoring [
2], fire detection [
3], geographical mapping [
4], precision agriculture [
5,
6], and atmospheric and oceanic observation [
7,
8]. However, despite its high spectral resolution, hyperspectral imagery grapples with a limited ability to discern fine object details. This limitation stems from capturing information only when objects reflect light at specific wavelengths, resulting in subdued discriminatory power, thereby impacting precise boundary and shape depiction. To alleviate the challenges posed by low spatial resolutions and limited information detection, fusion techniques combining hyperspectral images with panchromatic imagery have been explored. Addressing the issue of hyperspectral image fusion has long been challenging for researchers. Traditional methods often grapple with balancing the richness of hyperspectral data with enhanced spatial resolution. Many existing approaches either compromise spatial resolution to preserve spectral data or forfeit crucial spectral information in favor of enhanced spatial resolution. This enduring trade-off issue poses a significant quandary in hyperspectral image fusion—how to enhance the spatial resolution while retaining the depth of hyperspectral data.
In recent times, there has been a notable surge in the exploration of panchromatic–hyperspectral image fusion techniques. Image fusion methodologies are categorized into three distinct levels based on their processing stages: pixel level [
9], feature level [
10,
11], and decision level [
12,
13]. Pixel-level fusion focuses on the individual pixel points within two images, offering heightened accuracy and detailed information by directly manipulating the original data. However, pixel-level fusion demands extensive data processing, surpassing the complexity of the feature and decision levels, and requires meticulous alignment prior to fusion.
At the pixel level, color space-based methods, principal component analysis (PCA), and multi-resolution transformation techniques constitute common strategies [
14,
15]. Color space-based fusion involves transitioning images from the RGB color model to a sequential color system, employing methodologies like the HIS transform [
16,
17] and Brovey transform [
18,
19]. The HIS transform adeptly segregates spatial and spectral data. However, the principal component substitution technique’s limitation lies in its operation being confined solely to pixel-level functionality, making it prone to spectral aliasing. This issue causes the loss of intricate details in the fused image due to straightforward pixel-wise substitution. To address this shortcoming, a combined methodology intertwining the differential search algorithm, adaptive regional segmentation, IHS conversion, and RGB band processing was proposed [
20]. Principal Component Analysis (PCA) is an image fusion technique that amalgamates multiple images by reducing data dimensions and extracting essential features [
21]. However, its application may lead to information loss and the imposition of linear assumptions on intricate relationships, thereby constraining its efficacy.
Numerous researchers have delved into multi-resolution image fusion techniques. Toet introduced contrast pyramids in Gaussian pyramid-based fusion [
22], while Burt and Kolczynski derived gradient pyramids from Gaussian pyramids [
23,
24]. However, the pyramid transformation lacks translational invariance, potentially leading to spurious Gibbs artifacts in the fused images [
25]. Chipman proposed fusion using orthogonal wavelets [
26], Li presented a digital filter for consistency verification [
27], and Liu utilized a controlled pyramid algorithm [
28]. Other methods involve Li Zhenhua’s pyramid frame transform [
29] and Matsopoulos’ application of morphological pyramids in medical image fusion [
30]. While these advanced methods demonstrate progress, they seem to overlook comprehensive spatial consistency, possibly resulting in color and brightness distortions in the fused outcomes. To address this, proposed solutions include guided filters [
31] and bilateral filters [
32], which effectively tackle spatial consistency concerns and reduce edge artifacts [
33]. However, conventional bilateral filters exhibit limitations in effective image smoothing. To overcome this, Chen, B.H. introduced an innovative two-pass bilateral filtering approach for edge-preserving image smoothing, demonstrating exceptional performance [
34,
35]. Additionally, in the realm of multi-modal image fusion research, Goyal et al. focused on structure awareness and metric analysis, while Dogra and Kumar emphasized the use of guided filtering and multidirectional shearlet transform in medical image fusion [
36,
37].
The progress in Gaussian pyramid-based methods has not resolved the challenge of losing high-frequency detail during operations. To address this, a technique based on a Laplacian Pyramid direction filter bank was proposed to enhance fusion outcomes [
38,
39]. These advancements significantly impacted medical image processing, particularly in Laplacian Pyramid-based techniques. Methods employing Laplacian Pyramids and adaptive sparse representation were explored [
40,
41], notably improving lung cancer diagnosis through CT image fusion and the integration of multimodal medical images. Moreover, in reference [
42], fusion methodologies underwent a revolution by combining a Laplacian Pyramid with deep learning, surpassing conventional techniques in image fusion capabilities.
Traditional multi-scale pyramid image fusion methods have undergone significant advancements and applications in the domain of deep learning. The work conducted by Ji, Peng, and Xu exemplifies the practical implementation of deep learning models in conjunction with multi-scale pyramid image enhancement techniques for real-time underwater river crab detection [
43]. Notably, there has been a burgeoning interest in leveraging intelligent algorithms like Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) [
44,
45]. These methodologies have exhibited adeptness in effectively fusing image information, thereby elevating image quality and augmenting features [
46]. Specifically, the adoption of deep learning algorithms rooted in neural network theory has gained substantial prominence in the field of image fusion. Multifocal Image Fusion (MFIF) is capable of generating omnifocal images tailored to visual requirements, and ongoing research endeavors aim to mitigate the defocus spreading effects (DSEs) typically observed around focus/defocus boundaries. In response to DSE challenges, an innovative Generative Adversarial Network, termed MFIF-GAN, was introduced for the specific task of MFIF [
47].
The application of the wavelet transform holds a paramount position among multi-resolution image fusion techniques [
48]. Revered for its exceptional time–frequency constraints, the wavelet transform facilitates thorough multi-scale image analysis [
49]. This influential method adeptly captures both spatial and frequency domain characteristics, offering a comprehensive and intuitive depiction of images. Its capability encompasses the analysis of elements varying in sizes and resolutions within a single image, ensuring a detailed representation. Advanced versions of the wavelet transform, such as the variational binary wavelet transform [
50], multi-binary wavelet transform [
51], and boosted structure wavelet transform [
52], have expanded its potential, opening new avenues for further advancements in image fusion. However, there exists an exigent need for ongoing research to address computational redundancy within the wavelet transform and to devise innovative strategies to enhance computational efficiency. This endeavor is critical for ensuring spectral consistency in hyperspectral images, mitigating spectral aliasing, and preserving intricate high-frequency intensity components, which are often lost in the original HIS transform algorithm.
In this context, our research presents an innovative methodology for hyperspectral image fusion, integrating the HIS transform, wavelet transform, and Trust-Region Conjugate Gradient techniques. This pioneering approach is designed to enhance spatial resolution while preserving the abundance of spectral information. Our study is dedicated to resolving key questions regarding the preservation of spectral nuances in hyperspectral image fusion. We firmly anticipate that our contributions will propel advancements in the field, providing more detailed and accurate images tailored for practical applications.
The main contributions of this work are summarized as follows:
- (1)
We present the Trust-Region Conjugate Gradient (TRCG) method, an optimization technique for enhancing image fusion accuracy and efficiency, exploring its principles, mathematics, and practical applications.
- (2)
To approach the true value more accurately, our approach employed a two-tiered strategy, with the inner layer guided by the truncated conjugate gradient (TCG) for local optimization and the outer layer using the trust region algorithm (TRA) for global convergence.
- (3)
We conducted extensive experiments on widely used datasets, consistently achieving satisfactory performance compared with the latest hyperspectral image fusion methods.
2. Materials and Methods
In this section, we aim to elaborate on the experimental setup of our hyperspectral imaging system and elucidate the methodologies adopted for hyperspectral image fusion. We provide an extensive explanation of the core principles and practical applications of crucial techniques, notably the Hue–Intensity–Saturation (HIS) transform, wavelet transform, and the Trust-Region Conjugate Gradient method. Additionally, we delve into the empirical data, the nuances of our experimental design, and the relevant evaluation metrics employed in our scholarly exploration.
2.1. Experimental Design of Hyperspectral Imaging System
We provide an overview of the hyperspectral imaging system employed in our study and detail our designed experimental procedure. Our experimental design is tailored to acquire high-quality hyperspectral image data, which serve as the foundation for subsequent processing and analysis.
2.1.1. Hyperspectral Imaging System
In the domain of contemporary hyperspectral imaging (HSI), the acquisition and processing of high-quality data hold paramount significance. This study delves into an innovative experimental design tailored to augment both data quality and subsequent fusion processes.
Figure 1 presents a comprehensive portrayal of our HSI system, structured into three distinct modules: a narrow-band light generator, an imaging section, and a control system. This schematic delineates the design and operational principles governing these modules.
The narrow-band light generator incorporates a Xenon source (55 W, 6000 K) to produce a broad spectrum of light. This source interfaces with several optical elements, including lenses and a reflective ruled diffraction grating (1800 lines/mm, angular dispersion rate 1.8 mrad/nm), inducing chromatic dispersion. An imaging lens (Φ = 38 mm, f = 200 mm) converges parallel rays. The optical components, including lenses and a small-aperture light stop (Φ = 1 mm), shape the light rays, ensuring parallel incidence onto the grating. The grating diffracts the light, focusing it at the imaging lens’s focal plane, generating a chromatic band known as the first-order diffraction spectrum. A small aperture at the focal point transmits narrow-band light at a specific wavelength. The grating’s rotation, managed by a rotating platform, adjusts emitted light wavelengths by varying the incident light angle. The lighting section and the narrow-band light generator connect solely via an optical fiber (Φ = 4 mm).
The selected narrow-band light illuminates tissue samples using a complementary metal oxide semiconductor (CMOS) camera featuring a 1280 × 1024 array and 5.2 μm square pixels and operating at 15 frames per second. The control system, comprising computer hardware and software, commands two modules: the MCS-51 microcontroller and an electromotor, controlling the rotation platform to adjust the narrow-band light’s center wavelength. The CMOS camera captures images at various wavelengths. The control system synchronizes wavelength switching and image acquisition, storing raw data as a hypercube.
This modular design offers potential integration into modern consumer imaging products. For this study, the HSI system was installed on a stereomicroscope XTZ-E, boasting magnification ranging from 7× to 45× (Shanghai Optical Instrument Factory, Shanghai, China).
2.1.2. Experimental Procedure
Our approach involved meticulous data collection using cutting-edge hyperspectral imaging systems, followed by an extensive preprocessing stage. This preprocessing included crucial tasks such as noise reduction, radiometric correction, and meticulous image registration.
The spectroscopic measurement of monochromatic light generated by an active monochromatic hyperspectral imaging system holds paramount importance in ensuring the precision and efficacy of the imaging device. This meticulous process, especially concerning the RGB spectral bands, is vividly depicted in
Figure 2.
The top row exhibits the spectral power density curve, while the second and third rows showcase the CIE 1931 and CIE 1964 chromaticity diagrams, respectively. Detailed measurements, facilitated by the UPRtek MK350S spectrometer(UPRtek, New Taipei City, China), were conducted to ensure precise wavelength control and spectral fidelity.
Analysis of the spectral power density curve, CIE 1931 chromaticity diagram, and CIE 1964 chromaticity diagram allows us to evaluate the monochromatic performance of the spectrometer. The spectral power density curve illustrates the relative intensity of light across various wavelengths, displaying narrow and sharp peaks that denote the spectrometer’s exceptional monochromaticity. It effectively segregates light of different wavelengths. The CIE 1931 and CIE 1964 chromaticity diagrams indicate the positions of light at different wavelengths within the color space. The accurate representation of chromaticity coordinates in these diagrams confirms the spectrometer’s commendable monochromatic performance.
The spectrometer’s monochromatic performance is pivotal for its functionality, as it directly influences its capacity to precisely resolve and measure light of diverse wavelengths. Enhanced monochromaticity significantly improves the spectrometer’s accuracy in color measurement, spectral analysis, and other applications by allowing it to precisely differentiate and measure light wavelengths. This feature not only delivers high-resolution monochromatic imaging but also demonstrates exceptional wavelength stability and the capability to precisely select spectral bands. Our meticulous spectroscopic measurement and spectral analysis of monochromatic light, generated by the active monochromatic hyperspectral imaging system, not only yield top-tier data but also unleash the instrument’s full potential. This comprehensive approach not only enriches our understanding of spectral characteristics but also furnishes reliable spectral support across diverse domains, ultimately propelling advancements in research and the seamless integration of this technology into practical applications.
To exhibit the effectiveness of our preprocessing procedure,
Figure 3 showcases preprocessed images captured using an RGB camera. These images encompass various lighting conditions, including (a) red lighting, (b) green lighting, (c) blue lighting, and (d) synthesized color hyperspectral images. Additionally, (e) grayscale images captured under full-spectrum illumination are included for comprehensive evaluation. These images unequivocally demonstrate the success of our preprocessing method in elevating the overall quality and consistency of hyperspectral data. The application of these methods significantly enhances our hyperspectral data, ensuring a robust and accurate foundation for the subsequent fusion process.
2.2. HIS Transformation
Section 2.2 delves into the HIS color model and its role in hyperspectral image fusion. The exploration commences with an introduction to the HIS color model (
Section 2.2.1), followed by an elucidation of its application in hyperspectral image fusion (
Section 2.2.2). These sections aim to offer comprehensive insight into the utilization of the HIS color model to augment image quality and information fusion.
2.2.1. HIS Color Model
HSI means intensity, saturation, and hue. Based on the RGB color system, the RGB color image can be decomposed into R, G, and B channels. The R, G, and B can be transformed into the H, I, and S by mathematical transformation, which completes the HIS transformation of RGB color images. The majority of images we encounter in our daily lives are colored images, although images are fundamentally two-dimensional data with pixels typically represented as m × n. Such images are referred to as grayscale images, commonly recognized as black and white images. However, the representation of colored images necessitates an understanding of colorimetry. The CIE 1931 RGB color space is the most prevalent standard, wherein the combination of the three primary colors R (red), G (green), and B (blue) is determined by their respective tristimulus values to create a colored image. Consequently, RGB color images can be decomposed into three separate images, corresponding to the R, G, and B channels.
While the RGB color space is employed for color mixing and computation, the perception of an object’s color in daily life requires a color perception system, known as a color order system. The Munsell color system, an example of a color order system, defines three parameters—brightness, hue, and chroma (saturation)—to characterize color. When observing objects, the “Munsell Color Chart” can be used for comparison, enabling the confirmation of an object’s color. Analogously to the Munsell color system, Munsell introduced the HIS color model, which comprises I (intensity/luminance), S (saturation), and H (hue). Based on the RGB color space, RGB color images can be decomposed into their R, G, and B channels, and mathematical transformations can be applied to convert the R, G, and B channel components into H, I, and S channel components. This process constitutes the HIS transformation of an RGB color image. In the context of the HIS transformation, intensity I conveys spatial information, while H and S represent spectral information, thereby achieving the separation of spectral and spatial information.
2.2.2. Hyperspectral Image Fusion Based on HIS Transform
In the process of fusing panchromatic and hyperspectral images based on the HIS transform, we began by performing the HIS transform on RGB color hyperspectral images. In
Appendix A, we present the equations characterizing the linear RGB to HIS transformation. Following this transformation, the full-color grayscale image was preprocessed and introduced as a new component referred to as ‘I’ (intensity) within the color sequence system. This ‘I’ component was then fused with the ‘H’ (hue) and ‘S’ (saturation) components of the hyperspectral image. Subsequently, the HIS transformation was reversed, returning the data to the RGB color space and yielding the final RGB color fusion image.
2.3. Wavelet Transform (WT)
The wavelet transform represents a method capable of concurrently considering both the spatial and frequency domain attributes of an image. By decomposing the image into various frequency components across multiple scales, it enables a multi-scale analysis of the image. In terms of image fusion, the wavelet transform contributes to enhancing the spatial precision of the image. It achieves this by analyzing and integrating image details at different scales, thereby facilitating multi-scale processing of image details.
In
Section 2.3, we delve into wavelet transform and its crucial application in image fusion. Firstly, we introduce wavelet transform and its mathematical principles (
Section 2.3.1), elucidating its underlying concepts and mathematical foundations. Subsequently, we provide a detailed discussion on the application of wavelet transform in image processing (
Section 2.3.2), encompassing specific transformation methods and processes. Finally, we explore the role of wavelet transform in image fusion (
Section 2.3.3), offering an in-depth analysis of its pivotal contribution to the fusion process.
2.3.1. Wavelets’ Mathematical Principles
The mathematical representation of the Continuous Wavelet Transform (CWT) involves convolving a function, often referred to as the mother wavelet
, with the signal
across varying scales and translations. The CWT of a signal
with respect to a mother wavelet
at a scale
and translation
is given by
where
represents the Continuous Wavelet Transform of
at scale
and translation
,
denotes the mother wavelet,
is the scale variable that can control the scaling of the wavelet basis,
is the translation quantity that controls the translation of the wavelet basis, and
and
correspond to the frequency inverse and time, respectively.
The considerable computational intricacy of Continuous Wavelet Transform (CWT) and its constrained implementation in discrete systems have restricted its broad utility in practical engineering and data processing. To surmount these constraints and furnish more effective analytical tools, the Discrete Wavelet Transform (DWT) was introduced and extensively embraced. DWT, a discrete counterpart of CWT, represents a technique for disassembling a signal into various components of diverse scales and frequencies. However, it employs an alternative approach to achieve this disassembly. While the conventional CWT entails convolving the signal with continuous wavelets, DWT utilizes sampling and filter bank methodologies, rendering it more suitable, particularly in the domain of image processing.
The operation of DWT on a discrete signal
involves signal decomposition using a low-pass filter
and a high-pass filter
, followed by downsampling. In a single-level DWT, the signal
can be decomposed into approximation coefficients
(representing low-frequency components) and detail coefficients
(representing high-frequency components):
where
is the normalization factor, usually
;
;
;
;
;
denotes the approximation coefficients;
represents the detail coefficients (representing high-frequency components); and
and
are, respectively, the low-pass and high-pass filters.
2.3.2. Wavelet Transform of Image
When dealing with two-dimensional data such as images, the extension of one-dimensional DWT to a two-dimensional domain becomes imperative. This extension is known as the Two-Dimensional Discrete Wavelet Transform (2D DWT). 2D DWT of size M × N image function
f(x,y) is as follows:
where the superscript
denotes H (horizontal direction), V (vertical direction), D (diagonal direction). The index
signifies any starting scale. The approximation coefficient
defines the approximation value of
at scale
. The detail coefficient
adds horizontal, vertical, and diagonal details for
. Approximation coefficients
represent the low-frequency information of the image, capturing its overall characteristics. Horizontal detail coefficients
encompass high-frequency information in the horizontal direction of the image. Vertical detail coefficients
encompass high-frequency information in the vertical direction of the image. Diagonal detail coefficients
contain high-frequency information in the diagonal direction of the image. It is often set that
;
;
; and
.
and
are, respectively, the low-pass and high-pass filters.
In our preceding discussion regarding the 2D DWT, we presented significant variables, including the original image, denoted as
, accompanied by its corresponding approximation coefficients
and detail coefficients
,
, and
. Now, transitioning to the process of reconstructing the original image from these coefficients, we delve into the utilization of the Two-Dimensional Inverse Discrete Wavelet Transform (2D IDWT). This inverse transformation method harmonizes and combines
,
,
, and
coefficients acquired from the 2D DWT to effectively regenerate the original two-dimensional image
:
The Two-Dimensional Discrete Wavelet Transform (2D DWT) can be implemented using filtering and subsampling techniques. Initially, the 1D DWT is applied to each row of the image. Subsequently, the obtained results undergo another one-dimensional DWT in the column direction. In practical implementation, especially in computer programming, there might be a preference for filtering columns first and then rows, as this aligns better with the computational handling of image data. This approach can enhance efficiency or fulfill hardware requirements.
Figure 4 depicts this process. The image
serves as the input to
, undergoing convolution with
and
along the columns separately, followed by subsampling. The high-pass components depict the image’s vertical directional details, while the low-pass approximate components portray low-frequency vertical information. This process yields two sub-images, each exhibiting a halving of their resolution by a factor of 2. Subsequently, the resulting two sub-images are then subjected to filtering and subsampling along the rows, generating four quarter-sized images denoted as
,
,
, and
. Approximation coefficients
encapsulate the overall image characteristics, while detail coefficients
,
, and
represent detailed information in the horizontal, vertical, and diagonal directions, respectively.
Figure 5 illustrates the image
serving as the input for
at scale ‘
. Through a 2D DWT process, it generates the approximation coefficients
and detail coefficients
,
, and
at scale ‘
’. Subsequently,
is utilized as input for another 2D DWT, yielding the approximation coefficients
and detail coefficients
,
, and
at scale ‘
.
2.3.3. Application of Wavelets in Image Fusion
The prior discourse delineated the fundamental principles of wavelet transforms in image processing, elucidating their role in feature extraction and analysis. Concerning image fusion, wavelet transforms amalgamate multiple images or diverse image features to generate a composite image enriched with comprehensive information. Leveraging the multi-scale nature of wavelet transforms aids in capturing intricate details at varying scales, better preserving vital features that might be challenging to depict entirely within individual images.
In practical application, initiating the process involves a 2D DWT performed on each original image. Post decomposition of each image based on designated wavelet types and decomposition levels, fusion processing is carried out on the different decomposition layers. Distinct fusion operators can be applied to the various frequency components in each decomposition layer, culminating in a fused wavelet pyramid. Ultimately, the fused wavelet pyramid undergoes reconstruction via the 2D IDWT to yield the fused image.
The wavelet transform decomposes images into frequency components at different scales, encompassing both low-frequency information (related to the overall structure and general features of the image) and high-frequency information (related to the finer details and texture of the image). By integrating information from various frequencies, particularly the high-frequency details, it is possible to retain the subtle features of the image. This fusion process can employ methods such as weighting, thresholding, or other suitable approaches to amalgamate details from different scales and orientations, thereby maintaining or enhancing spatial precision during image fusion.
Moreover, wavelet transforms aid in identifying essential image features, such as edges, textures, and more. Prioritizing the preservation of these crucial features during fusion notably enhances spatial accuracy within the image. Reasonable utilization of high-frequency information during image merging effectively amplifies image details. This can be achieved through the selection of specific segments of high-frequency components or employing a fusion strategy focused on detail enhancement, thus contributing to heightened spatial precision during the image fusion process.
2.4. HIS, Wavelet, and Trust-Region Conjugate Gradient (TRCG-HW)
The TRCG-HW image fusion methodology represents a comprehensive approach aimed at achieving superior performance in multispectral image fusion. This method seamlessly integrates the HIS (Hue, Intensity, Saturation) transformation, wavelet transformation, and trust region algorithms to optimize the image fusion process. The HIS transformation plays a pivotal role in preserving spectral information, while the wavelet transformation significantly enhances spatial accuracy. The incorporation of trust region algorithms orchestrates and optimizes the entire process cohesively. The primary objective is to procure high-quality fused images while excelling across various performance metrics.
The trust region methodology functions as an optimization strategy that intricately balances local and global models by confining a specific space around an iteration point to simulate the objective function. In contrast, the conjugate gradient approach is a dedicated optimization methodology focused on minimizing the objective function by reducing residuals during step size and direction adjustments. In the realm of image fusion, these methodologies wield significant influence, refining the fusion algorithm profoundly. The trust-region technique adeptly oversees the optimization process at each stage, ensuring a gradual refinement of fusion outcomes within a localized model. On the other hand, the conjugate gradient technique operates as an iterative process, pinpointing the most optimal direction at each step, enabling rapid convergence towards a globally optimal solution. These methodologies collaborate seamlessly to orchestrate and refine the entire image fusion process, striving to preserve image characteristics while attaining exceptional quality in the resultant fused image.
2.4.1. Mathematical Principles of TRCG
In this subsection, we delve into the mathematical foundations that underpin the TRCG method. We delve into key notions such as conjugate gradients, trust regions, and the rational fusion of these concepts to forge a pathway to efficient optimization. Consider an unconstrained nonlinear optimization problem:
Using the trust region method to solve (5), we first give the current trust region trial step size
(conventionally called the trust region radius) and then solve the quadratic subproblem of an approximation of problem (5):
where
represents the reference point;
denotes a small increment or offset.
represents a function of
,
might denote the gradient, and
represents the Hessian matrix at
.
represents the inner product between the vector
and
.
denotes the quadratic form of the matrix
applied to the vector
.
where
is trust region trial step. It describes the extent to which we can trust the quadratic approximation model.
Next, we consider using the trust region method to solve discrete operator equations:
where
is a PSP matrix,
is the input to be sought, and
is the measured output.
First, we form the following unconstrained least squares problem:
The gradient and Hessian matrix of the functional M[f] can be explicitly calculated as , .
To solve with the trust region algorithm (TRA) (9), one needs to solve the following trust region subproblems (TRSs):
In each step of the trust region iteration, the solutions of subproblems (TRSs) (10) and (11) do not have to be too precise, which can be achieved by using the truncated conjugate gradient (TCG) method. The point list generated by solving (10) is as follows:
where
refers to the gradient vector at iteration step
k,
refers to the gradient vector at iteration step
k + 1,
denotes the Euclidean norm of vector
, and
denotes the Euclidean norm of vector
. The initial value is
If the current iteration is in the trust domain, we accept it and transfer it to the next trust domain iterative process; if or runs outside the trust domain, we take the longest value in the trust domain step and terminate the iterative process.
2.4.2. Methodology
The method described in
Figure 6 begins by applying the HIS transformation to the RGB color hyperspectral image (HSI). Simultaneously, the panchromatic (PAN) image undergoes preprocessing and is incorporated into the color sequence system as an “I” (intensity) component. Next, the wavelet transformation is applied to the “I” components from both the panchromatic image and the hyperspectral image for improved fusion. Through the optimization of wavelet coefficients using TRCG, a new “I” component is obtained. This new “I” component is merged with the “H” (hue) and “S” (saturation) components of the hyperspectral image. Eventually, by reversing the HIS transformation, the data are restored to the RGB color space, producing the final RGB color fusion image. Several meticulously designed steps are employed in merging multispectral images to achieve superior performance. These steps involve defining evaluation metrics and an objective function, performing the optimization process using TRCG, and conducting evaluation and adjustment stages.
Prior to optimization, evaluation metrics and an objective function are defined. The evaluation metrics encompass various aspects of the image, including structural similarity, signal-to-noise ratio, spectral information, rate of change, and correlation. These evaluation metrics are integrated into an objective function that comprehensively evaluates the quality of the fused image. For normalization, Min–Max standardization was utilized to scale each metric within a range of 0 to 1. Metrics such as SSIM, PSNR, and SAM, with higher values indicating better performance, were used directly after normalization. However, ERGAS and CC values were computed by subtracting their normalized scores from 1, aiming for lower values to signify superior performance.
The objective function is represented as follows:
where
w1,
w2,
w3,
w4, and
w5 symbolize the respective weights attributed to individual evaluation metrics. These weights were assigned to determine the relative significance of each metric within the objective function. Assuming equal impact from all metrics, assigning uniform weights of 0.2 to the normalized SSIM, SAM, ERGAS, CC, and PSNR values facilitated the formation of a unified objective function. This method ensures an equivalent contribution from each metric to the overall objective function. It is crucial to mention that altering these weights might be more suitable if particular metrics significantly influence the objective function in varying degrees of enhancement or degradation. In such instances, adjusting the weights based on the specific influence of each metric on the objective function could be more appropriate. The holistic assessment offers a comprehensive evaluation of image quality, minimizing potential biases inherent in individual metrics. Integrating objective functions not only saves time and energy but also reduces the potential misguidance of a single indicator, enhancing the reliability of decision making.
The TRCG method employs a two-tier strategy, comprising an inner layer and an outer layer, to ensure efficient and accurate image fusion. The inner layer utilizes truncated conjugate gradient (TCG) techniques, which concentrate on localized optimization within specific regions. By computing gradient information and preserving the most significant components, TCG refines the fusion process on a local scale. It achieves this through iterative updates of wavelet coefficients, thereby enhancing fusion quality within these localized areas. On the other hand, the outer layer operates using the trust region algorithm (TRA) to oversee global convergence. TRA calculates the gradient of the objective function and supervises the entire optimization process, ensuring effective convergence across the image. It collaborates with the inner layer and dynamically adjusts the trust region radius and step size to strike a balance between efficiency and accuracy throughout the fusion process. By integrating these two layers—local optimization via TCG and global convergence management through TRA—the TRCG method aims to achieve a synergy that balances efficiency and accuracy, ultimately enhancing the overall quality of the fused image.
4. Discussion
The TRCG-HW technique has undergone thorough assessments, employing both visual inspections and quantitative analyses on simulated and real datasets. Its consistent excellence in producing high-quality images, maintaining both structural and spectral information, and minimizing reconstruction errors, has been convincingly showcased when compared to other methods for hyperspectral image fusion:
- (1)
High Image-Quality Fidelity: The TRCG-HW method achieved outstanding scores in PSNR evaluations, indicating its ability to reconstruct images with high quality. It outperformed other methods in terms of image fidelity.
- (2)
Preservation of Structural Information: The TRCG-HW method obtained significant scores in SSIM and CC evaluations, demonstrating its excellent performance in retaining structural information and maintaining a high level of correlation with the original data.
- (3)
High Spectral Fidelity: SAM and ERGAS scores for the TRCG-HW method indicate its effectiveness in preserving spectral fidelity, allowing images to better reflect the spectral characteristics of objects.
- (4)
Minimal Reconstruction Errors: Qualitative results in the form of error maps illustrate that the TRCG-HW method achieved the fewest reconstruction errors, further substantiating its outstanding performance in preserving both spatial and spectral information.
In comparison to previous research, the novelty of integrating HIS, wavelet, and TRCG techniques into a unified framework is a significant contribution. HIS transformation aids in spectral information preservation, while wavelet transformation enhances spatial accuracy. The unique feature of the TRCG technique, optimizing images at both the local and global levels, allows the method to excel in various aspects. Furthermore, the TRCG-HW method does not focus solely on one aspect but integrates multiple performance metrics, signifying its well-rounded excellence in different aspects, resulting in overall superior performance.
The TRCG-HW method significantly differs by employing a hierarchical optimization approach to handle high-dimensional hyperspectral data. It utilizes an inner-layer local optimization and an outer-layer global optimization strategy. Local optimization helps reduce computational complexity, while global optimization approximates the true values. This provides it with a computational efficiency advantage, especially in large-scale hyperspectral datasets. The local optimization employs the truncated conjugate gradient (TCG) algorithm at each wavelet decomposition level to ensure the effective extraction and preservation of image details during the image reconstruction process. This enhances spatial accuracy and structural preservation. The outer-layer global optimization introduces the trust region algorithm (TRA) to coordinate features between different local optimization levels to ensure their consistent coordination throughout the entire image. It ensures that the TRCG-HW method can achieve a global optimum solution in both the spectral and spatial dimensions, resulting in higher image quality. Another key feature of the TRCG-HW method is its utilization of multiscale information extraction via wavelet transformation from panchromatic images, which improves spatial accuracy and preserves structural information.
While our study has produced promising outcomes, it is imperative to acknowledge its limitations. One notable constraint pertains to the computational intricacy inherent in utilizing wavelet transformation. The time-consuming nature of this process might pose challenges, especially when handling extensive hyperspectral datasets. Furthermore, our method primarily concentrates on optimizing intensity components within the HIS transformation. Although this reduction in computational complexity is beneficial, it might constrain the comprehensive reconstruction of spatial structures and intricate spatial details.
To mitigate the computational complexity concern, future investigations could explore parallel processing methodologies and hardware acceleration. These approaches could substantially enhance our method’s efficiency, rendering it more viable for real-time applications. Moreover, to overcome the limitations related to spatial details, further exploration could focus on integrating non-local self-similarity attributes of panchromatic and hyperspectral images. Leveraging these characteristics could augment the local optimization process within the TRCG-HW framework, elevating the overall performance in hyperspectral fusion, particularly in spatial reconstruction.
Future research endeavors could be focused on various aspects. Primarily, there should be a concerted effort to adapt the TRCG-HW method for real-time applications, particularly in domains like automated monitoring and decision support systems. This adaptation necessitates heightened performance and computational efficiency, enabling the method to swiftly process data in real-time scenarios. Further refinements targeting the enhancement of the TRCG-HW method’s performance, particularly in computational efficiency and data processing speed, would render it more compatible with real-time applications and large-scale data processing.
Secondly, while the TRCG-HW method exhibits prowess in hyperspectral image fusion, future investigations could extend its utility to diverse fields such as medical imaging, remote sensing, and surveillance. Researchers could explore methods to tailor the TRCG-HW approach to handle various data types, broadening its application spectrum. By applying this method to different data types, like multimodal images or stereoscopic images, it can cater to a wider array of field-specific requirements.
Lastly, an avenue for exploration could involve the fusion of hyperspectral stereo images or other multimodal data to acquire more comprehensive information. Such an approach could bolster a broader range of application domains, including, but not limited to, environmental monitoring and geological exploration. This could lead to more nuanced and enriched data interpretations, amplifying the method’s utility across diverse fields.