1. Introduction
Hyperspectral imaging has been used in many applications, including oil, gas, and mineral exploration, agriculture, remote sensing, quality control, forensic science, ecology, medicine, and others [
1,
2]. A hyperspectral camera captures the reflections of many bands of wavelengths called channels and stores the data in the form of stacked two-dimensional (2D) images. Since the images are captured together at the same location and time using different bands of wavelength, it is expected that overlaps of information can occur among images of adjacent bands. However, it is also known that some images contain exclusive information not available in others. When processed together, the images provide various complementing information useful for data analysis. For example, in hyperspectral data of oil palms, images of different bands are successfully used to detect early infections of Ganoderma, blast, freckle, and bagworms [
3,
4,
5,
6].
Oil palm hyperspectral images are usually acquired from satellites or drones. Often, channels of long and short wavelengths contain more noise than those of the medium wavelengths. Noise in the form of black and white speckles reduces the quality of the images and interferes with the extraction and analysis of features or information in them [
7,
8]. For example, noise in airborne hyperspectral data may interfere with the process of establishing a good correlation with the field spectroscopy data taken on the ground. Consequently, if the noisy channels cannot be cleansed successfully, they are discarded because their values are too volatile to be useful. It is observed that only short- and long-wavelength channels are severely affected by speckle noise, and the other channels that make up most of the data are unaffected. In many reported studies, it is common to contaminate clean images with synthetic noise before they are denoised with the proposed methods. In this way, the original signal strength is known and therefore the SNRs can be calculated. However, it is more important to test the method on real data with severe noise to show the effectiveness of a denoising method. In this study, a framework to filter noise in oil palm hyperspectral data is introduced. To the best of our knowledge, this is the first attempt to denoise oil palm hyperspectral data using such an approach.
There are two main strategies of denoising hyperspectral data, and they are the spatial–spectral and the deep learning neural network approaches [
9,
10]. In essence, both methods seek to reduce variations or increase similarity in homogeneous areas by adjustment or replacement of pixels with values deemed correct [
11,
12]. The effect is akin to lowpass filtering in the target areas [
13]. Spatial–spectral approaches make use of spectral correlation, sparsity promotion, variation minimization, selective data content reduction, low-rank matrix or tensors, and others with the aim of reducing the volatility that contributes to the noisy variations in the channels [
14,
15,
16,
17]. Normally, deep learning methods require a lot of data and substantial training to be effective.
Chang and Brumbley (1999) used a combination of a linear unmixing model and Kalman filter for subpixel detection and classification of remotely sensed images [
18]. Atkinson et al. (2003) introduced a wavelet-based estimation scheme of noisy hyperspectral signals that filter out the noise content of the data in a similar fashion to that of the Wiener filter [
2]. Since the clean data were available, the correlation matrix of the channels was able to be established in a straightforward fashion. The method was successfully tested on lunar lake hyperspectral data with added noise, and the results were good. Zelinski and Goyal (2006) exploited a strong correlation between bands of hyperspectral data by enforcing sparsity of their wavelet representations to remove noise [
19]. Their method was tested on AVIRIS data that had been injected with synthetic noise, and it outperformed the wavelet-based global soft-thresholding method. It was also tested on actual noisy hyperspectral data to recover junk bands that had been disregarded.
Chen and Qian (2011) used a PCA–wavelet scheme to denoise hyperspectral data that had been contaminated by low-level noise. The method was designed to remove the subtle noise while preserving important details in the data. Important information in the hyperspectral data cubes was decorrelated from the noise using PCA, and the low-energy PCA output channels of the PCA were eliminated together with the noise. Their method was tested on two AVIRIS data cubes, and they claimed that their approach produced better results than other denoising methods [
20]. Chen et al. (2014) utilized PCA and block-matching 4D (BM 4D) to filter low-level noise from hyperspectral data. They claimed that it produced better results than their earlier studies using a PCA–wavelet scheme [
21].
Zhang et al. (2013) employed low-rank matrix recovery (LRMR) for an efficient hyperspectral image (HSI) restoration method based on low-rank matrix restoration [
22]. Chang et al. (2020) claimed that nonlocal self-similarity is critical for denoising, and presented a one-way low-rank tensor recovery method to capture the structural correlations inherent in HSIs [
23]. In order to combine the spatial nonlocal similarity with the low-rank characteristics of the global spectrum, nonlocal meets global (NG-Meet) presents a unified spatial–spectral paradigm for HSI denoising. Xue et al. (2019) designed a nonlocal low-rank regularized tensor decomposition to fully utilize the global correlation across the spectrum and nonlocal self-similarity properties for HSI denoising [
24].
Maffei et al. (2019) proposed a denoising approach using deep learning and a CNN to filter hyperspectral images. The approach exploited the correlation between adjacent bands in the hypercube, and they claimed that it performed better on synthetic and real data than other available methods [
25]. Chang et al. (2018) introduced HSI-DeNet, which employed deep convolutional neural net studies for HSI denoising. The network learned a series of multichannel 2D filters for the spatial and spectral structures of HSIs. The HSI denoising convolutional neural network (HSID-CNN) uses spatial and spectral information to recover clean images through two parallel feature extraction branches [
3].
Pan et al. (2022) utilized a method called SQAD that adopted a quasi-attention recurrent NN to filter noise from hyperspectral data [
26]. A spatial–spectral interactive restoration (SSIR) framework was proposed by Zhao et al. (2023) by exploiting the complementarity of model-based and data-driven methods. This method was implemented in removing noise in HSIs. A CNN and TF were used as a module for denoising that produced content-based interactions to better capture global heterogeneity differences in HSIs and achieved local–global dependence modeling. The unsupervised unmixing module improved the generalization ability of SSIR and enabled stable denoising performance [
27]. Two-dimension non-subsampled shearlet transform (NSST) and fully constrained least squares unmixing (FCLSU) were implemented by Karami et al. (2016) to denoised low-noise (LN) and high-noise (HN) image bands separately. Firstly, the spectral correlation was exploited to separate the LN bands from the HN bands. Next, NSST was applied to each spectral band of the hyperspectral images, and finally a threshold technique on shearlet coefficients to denoised LN bands and applying FCLSU were used to denoise the LN bands [
28].
Zhang et al. (2021) implemented HSI denoising and destripping with extended HSI observation model by exploiting a double low-rank (DLR) matrix decomposition method [
29]. Two low-rank constraints were formulated into one unified framework by concurrently traversing the low-rank characteristic of the lexicographically ordered noise-free HSI and the low-rank structure of the stripe noise on each band of the HIS [
29]. A self-modulating convolutional neural network (SM-CNN) was introduced by Torun et al. (2024) by implementing correlated spectral and spatial information. Adaptive feature transformation based on the adjacent spectral data novel block named the self-modulating residual block (SSMRB) acted as the core of the model, resulting in enhancing the network’s ability to handle complex noise and adapting its predicted features while denoising every input HSI with respect to its spatial–spectral characteristics [
30].
Ertürk et al. (2016) proposed the use of spatial–spectral sparse unmixing to further enhance the unmixing and denoising performance. The method utilized a priori known spectral libraries and calculated spatially smooth abundances for the sparse number of prominent signatures detected in the image. These signatures and their respective abundances were used to reconstruct a denoised version of the original hyperspectral image [
31].
In summary, denoising or noise filtering of hyperspectral data are implemented using two main approaches, and they are the spatial–spectral and the deep learning neural network approaches. In most of the studies, the approaches were tested on data that had been spiked with noise. In this work, we propose a filtering framework that is effective and fast to implement. The effectiveness of the approach is tested on some channels of real oil palm hyperspectral data that are inherently noisy.
2. Methodology
Noise in hyperspectral images is mainly caused by changes in the reflection characteristics across the wavelength spectrum during the aerial image acquisition. Humidity, airborne particles, cloud shading, and bright reflectance from the background may introduce noise or reduce contrast of the hyperspectral images [
20,
21]. Object movements caused by blowing wind may blur hyperspectral images. When the area to cover is large, many flight sorties are required to capture the area over a period of several days. Luminosity and contrast variations may occur when the data are taken at different brightness levels and ambient conditions. When the images are stitched, differences in luminosity and contrast become evident. In addition, water droplets, debris, and other floating particles in the air may absorb, reflect, or scatter wavelengths used to capture the images and thus create noise or decrease contrast of foreground and background objects. Hyperspectral data with a high resolution are always desirable, as they contain more information. When the resolution is low, chances are more objects might overlap in each pixel [
32,
33].
2.1. Database
The area under consideration has nearly 20 thousand oil palms of Dura x Pisifera (DxP) type whose age is slightly more than eleven years. It is a flat area that records a consistent yearly rainfall of roughly 2700 mm. Most of the oil palms are healthy and they bear fruit periodically all year long. The camera used to acquire the hyperspectral data was a Resonon Pika L (Resonon Inc., Bozeman, MT, USA). It was carried by a DJI M600 Pro hexacopter drone, Dà-Jiāng Innovations Science and Technology Co., Ltd., Shenzhen, China. The hyperspectral data contained 300 spectral bands with wavelength from 350 nm to 1000 nm. The spectral resolution was 2 nm. The drone can haul a maximum load of 6 kg and sustain a flight duration of 15–20 min. A calibration tarpaulin was used as a reference on the ground.
As much as possible, data recording was carried out under clear sky, albeit at different sunlight intensity depending on the flight time and cloud shading. The altitude of the UAV was maintained at approximately 80 m during data sorties, and that translated to a resolution of nearly 10 cm × 10 cm of ground area for each pixel. The average speed of the drone was approximately 25 km/h during data acquisition.
Figure 1 shows part of the area under consideration and
Figure 2 shows a pair of the lowest- and highest-wavelength channels that were affected by noise and luminosity variation. Keep in mind that the intensity range of the images have been scaled to 255 for display. It seems that the channels of the short wavelength contain more noise than those of the long wavelength.
2.2. Approach
In this work, one-dimensional (1D) filtering was used to filter the speckle noise that normally affects the low- and high-wavelength channels of oil palm hyperspectral data. The main idea is the fact that the intensity of a pixel at a particular location in a channel varies slowly within the channel and across the neighboring channels. Sharp changes in intensity of a particular pixel are attributed to noise and can be lowpass-filtered. For fast denoising, 1D filtering is performed across the channels and within each channel using the same filter and is chosen as the workhorse of the framework since it is fast to implement and suitable for implementing Wiener and Kalman filters. The flow of steps taken to filter noise from the hyperspectral data is shown in
Figure 3.
The first step in the process was to read the hyperspectral data and store them in a 3D array. The size of the data was limited to 500 × 500 × 300. From experience, it is found that only the channels that belong to the low wavelength of 350–380 nm and high wavelength of 900–1000 nm are visibly noisy. Thus, filtering was focused on these channels, and they were filtered together in two separate groups. It was observed that the noise level at both ends of the wavelength became worse as the channel wavelength decreased toward 350 nm and increased toward 1000 nm.
The second step was to identify some 50 × 50 patches of homogeneous or uniform areas with low means and low standard deviations (stds). This step was executed offline on one of the cleaner channels automatically. Since the locations of the homogeneous areas were the same for all bands, the relative noise levels in the noisy channels were able to be estimated from those areas. However, it should be noted that the SNRs are only estimates. The means and stds of the identified patches were calculated along with the means of the channels. Using these parameters, the signal-to-noise ratios (SNRs) of the channels were estimated as follows [
34]:
where:
µ is the average intensity of the channel.
σ is the average of the standard deviations of the homogeneous areas.
The SNR estimates were compared against a threshold (Th), and channels whose SNRs fell below the Th were selected for denoising. The third step was 1D spectral filtering that exploited the strong correlations among consecutive bands. It started with collecting the intensity values of every pixel (x,y) of the noisy images and storing them in a 1D array. Note that the noisy channels of low and high wavelengths were collected and processed separately. For each pixel, the values in the array were filtered starting from the band with the lowest noise towards the band with the highest noise at the wavelength ends.
Figure 4 depicts how the pixel values of the noisy bands at one location (x,y) were collected in a 1D array. The 1D spectral filtering was conducted as follows.
Suppose that there are M noisy images and the pixel values at a location (x,y) are collected in a 1D array. Let s(m) be the array containing the pixel values of one location (x,y) from M channels, where m = 1, 2, …, M. The values are stored sequentially channel by channel. The linear Kalman filter is an algorithm that is implemented by a set of recursive equations. The successive prediction and correction equations of the Kalman filter at element
i of the array,
s(
i), are simplified as follows [
35]:
where
is the output value generated by the Kalman filter.
Q and
R are the process and sensor noise variances that are assumed to be constant and known.
P(
i) and
K(
i) are the variance of the state
x(
i) and the Kalman gain that evolve together with the estimated state (output)
.
and
are interim values that are necessary in the estimation process.
The adaptive Wiener filter adjusts the value of a pixel depending on the local mean
μ(
i) and variance
σ2(
i) of the neighborhood N centered at
s(
i) [
36]. The following equations describe the implementation of the filter:
where
ν2(
i) is the noise variance of the local neighborhood. If it is unknown, it is replaced by the average of all the local variances. For most pixels in the array, the neighborhood N consists of 2
w + 1 values of s(
I–−
n) where
n ϵ [−
w,
w], but if the pixel is near the beginning or end of the array, there will be fewer pixels in N. In the spectral filtering, the length of the filter used was 5, and thus
w was 2. Three other 1D filters that were tested in the same framework were the Savitzky–Golay, wavelet, and cosine filters.
The Savitzky–Golay filter fits pixel values in N to a polynomial function based on the least square error. The output is the value of the polynomial at the position of the center pixel given by the fitted polynomial. It depends on the pixel values in the neighborhood and the polynomial function used for the least square fitting. To be consistent with other filters, the length of the neighborhood was 5, while the degree chosen for the polynomial was 1. The wavelet and cosine filtering were performed via transformation, thresholding for coefficient reduction, and inverse transformation. Daub4 was chosen as the basis for the wavelet transformation, as it is compact and requires less padding for endpoints of the arrays. Filtering was achieved by setting some detail coefficients to zero or decreasing them via soft thresholding. Similarly, for cosine transformation, some high-frequency components were suppressed or set to zero by the same strategy. Usually, the threshold values of the wavelet and the cosine coefficients are obtained by the fixed-form method. However, since the actual std of the noise was unknown, the threshold was set at 10% of the absolute value of the highest coefficient. After thresholding, the coefficients were inverse transformed back to the spatial domain. The threshold value can be incremented slowly in subsequent iterations.
In the 1D spatial filtering, each noisy channel was subjected to vertical and horizontal filtering separately. The filters were used in the same way as in the 1D spectral filtering. For the Wiener and Savitzky–Golay filters, the size of the filter was still 5, since a longer filter would blur the image, eliminate thin leaves from the fronds, and take longer to execute. The wavelet and cosine transformation were performed in horizontal and vertical directions consecutively. Hard thresholding that sets some detail coefficients to zero produces better results than soft thresholding, which shrinks the entire coefficient range. Then, the images were inverse transformed back to the spatial domain. The spectral and spatial filtering were repeated until the maximum number of iterations allowed was reached.
After filtering, the SNRs of the noisy channels were recalculated to check the improvement achieved. The SNRs were compared to the ones before filtering to obtain the gain in SNR for each channel. The execution times of filtering the noisy channels using different filters were logged. For display, the pixel values of the channels were normalized so that they fell between 0 and 255.
3. Results and Discussion
The proposed method was executed on a MATLAB 2023b platform running an Intel i7 multicore processor (Intel, Santa Clara, CA, USA). It was applied to hyperspectral data containing 300 bands, and the size of the image of each band was 500 × 500 pixels. The homogeneous areas of 50 × 50 pixels with the lowest means and stds were searched automatically. For noisy images, the stds of these areas were higher, since they contained higher noise levels than the ones in clean images. After calculating the SNR estimates of all channels, the cleanest channel with the highest SNR was identified. The SNR threshold (Th) for determining noisy channels was set at 30% of the highest SNR. All channels whose SNRs fell below the Th were considered noisy. The maximum number of iterations was set at two. The Th and maximum iteration values were selected by trial and error on an ad hoc basis. For the data in this experiment, the noisy channels were the first 8 channels and the last 40 channels of the hyperspectral data. They were clustered into two groups and processed separately.
For each group, the channels were arranged sequentially. The collection of pixel values in a 1D array started from the pixel of the channel with the least noise to the channel with the worst noise. Then, the array of each pixel location was subjected to 1D spectral filtering. This was followed by 1D spatial filtering in horizontal and vertical directions. The 1D spectral and spatial filtering were repeated until the number of iterations exceeded the maximum (Max) allowed. After that, the SNRs of the filtered channels were recalculated. All the noisy channels showed marked improvement. Finally, pixel values in copies of the denoised channels were normalized for display.
In the experiments, the maximum number of iterations was limited to two to minimize the execution time and preserve the channel content from excessive filtering. The average SNR gain and execution time after filtering the first and last four channels are given in
Table 1 and
Table 2, respectively.
The average execution time for processing a 500 × 500 channel was less than 1 second for all filters except the Savitzky–Golay filter. It took longer to implement the Savitzky–Golay filter, as it required fitting a polynomial of degree 1 to the data points.
Figure 5 and
Figure 6 show the results of applying the filters to the first and last four channels, respectively. As observed, all filters managed to improve the images significantly. The Kalman filter was the fastest to implement and produced the best results. It managed to filter a significant amount of noise from the images, balance the luminosity drift in the first four channels, and remove the stitching line in them. This is probably because the Kalman filter is a model-based approach that estimates the current state by combining noisy measurements and uncertain past predictions. It operates recursively, updating the state estimates as new observations become available by revising the state variance and Kalman gain. The Kalman filter is optimal if the probability density functions of the states and noises are all Gaussian. Unlike the other lowpass filters, the Kalman filter generates a new weight with each measurement, rather than using the same fixed coefficients. The Kalman filter takes into account the underlying system model and noise characteristics to provide an estimate. When the Gaussian assumption on the noises and state of the model is not met, the performance of the Kalman filter becomes suboptimal. However, this does not mean that it will not work, and it can still produce respectable performance.
The Kalman filter does not require both filter length and threshold. In the experiments, the best filter length for Wiener and Savitzky–Golay (SG) filters was five. Increasing it above five would have made the output blurry due to it generating a greater lowpass filtering effect. This is because the Wiener filter depends on the value of the mean of the local neighborhood, which is practically obtained by a lowpass moving average filter. As for the SG filter, it can be viewed as a lowpass FIR filter whose coefficients depend on the polynomial function used. As with any other FIR filters, increasing its size would blur the output. Furthermore, a longer SG filter is much slower to implement since it requires fitting a polynomial to more pixel values by minimizing an error function. Shortening the length of the filters to three would make them too short to be effective. In summary, the filter size of five was obtained by trial and error to produce the best results. Making the size adaptive would lead the algorithm to choose size five eventually, although its execution time would increase considerably. The length of the wavelet filter was fixed at four, since we adopted the Daub4 basis. Changing its size entails changing the basis, which might not be favorable. There was no size for the cosine filter, as the cosine transform was performed on the entire row and column of the channel image. The threshold for wavelet and cosine coefficients was set at 10 percent of the highest coefficient. This value was also obtained by trial and error. When the threshold was increased, more high-frequency components were suppressed, resulting in a blurry output. However, when the threshold was reduced, more noise remained in the output.
Normally, besides SNR, SSIM and PSNR are calculated as denoising metrics to provide relative filter performance. They compare a noisy image to its clean self to come up with the index. However, since there are no clean copies of the noisy channels, reference-less image quality metrics like NIQE can be utilized. NIQE is trained on relatively clean images and may not correlate well with human quality perception [
34]. However, it does not use subjective quality scores. NIQE assigns a score to an image, and the lower the score, the better the visual quality of the image should be.
Table 3 presents the NIQE metrics of the first four and last four noisy channels before and after filtering. As seen, after filtering, all channels registered lower NIQE indices.
It was found that the spatial filtering by itself was not effective in denoising the channels. When the noisy bands were subjected to spatial filtering repeatedly, they lost contrast and became blurry. Applying the 1D spectral filtering alone iteratively reduced the noise, but the improvement tapered off just after a few iterations. When spectral and spatial filtering were combined, they produced better results, but increasing the maximum iteration did not improve the results after the second iteration. This was due to the blurring effect of lowpass filtering. The main concern of this work was whether the spectral and spatial filtering of the channels suppress important local features within individual channels, resulting in the loss of channel characteristics. This concern will be addressed in future work.