1. Introduction
Image spectroscopy is an optical remote sensing technique that integrates imaging and spectral detection. Data cubes obtained from data capture contain two-dimensional spatial images of objects and one-dimensional spectral radiance (also known as spectral information). This allows the retrieval of the spectral curve of a target from any pixel in the data cube, while also providing spatial images in different wavelength bands [
1,
2,
3,
4]. Due to its capacity to analyze and identify objects based on both geometric shape and spectral characteristics, image spectroscopy has found widespread applications in various fields such as the military, exploration, and environmental monitoring. Traditional image spectroscopy techniques suppress aberrations and enhance spectral quality by increasing the number of optical components in a system and using detectors with better performance. This approach effectively reduces the impact of system aberrations through different combinations of optical elements, improving the quality of spectral information. However, it leads to complex optical systems, large amounts of equipment, and high costs. Moreover, when the number of optical components reaches a certain point, their ability to improve spectral quality diminishes. Therefore, there is a need for a new approach to enhancing spectral quality. With the development of computer technology, computational power has significantly increased. Further optimization and processing of existing spectral information can again improve spectral quality, and this approach relies on computational imaging technology [
5,
6].
Single-lens imaging is a development based on computational imaging techniques. It utilizes an optical system composed of a single-lens element to replace traditional complex optical systems. By employing backend algorithms for processing instead of front-end optical processing, it not only simplifies the complexity of optical systems, but also yields images that meet specific requirements. The earliest proponents of single-lens imaging were Schuler et al., who, in 2011, created a single-lens camera containing only one lens element [
7]. They introduced an alternating algorithm for removing mosaic artifacts and image blurriness using images captured by the camera to validate the effectiveness of their algorithm in mitigating optical aberrations and blurriness. Building upon Schuler’s work, Heide et al. developed specialized image restoration algorithms tailored for single-lens imaging [
8]. Their algorithms suppressed optical aberrations, reducing the complexity, weight, and cost of their front-end optical system. They introduced the cross-channel prior deconvolution algorithm and found that the edge information of objects in the R, G, and B channels shared similar positions. They used the information from one channel as prior knowledge to deconvolve blurry images from the other two channels, significantly improving the quality of the final restored images. Li Weili et al. utilized the front lens element of a Canon EF 50 mm F1.8II lens to construct a single lens and adapted it to use with a Canon 5D Mark II camera [
9]. They developed a blind deconvolution image restoration algorithm based on maximum a posteriori probability for blurry images obtained with their single-lens imaging system. They introduced new priors related to the structure of the blur kernel and smooth color transitions in the images, enhancing the accuracy of point spread function (PSF) estimation and, consequently [
10,
11], improving the quality of their final image restoration.
Utilizing a single-lens imaging system in conjunction with algorithms allows clear images to be obtained. However, two-dimensional image data cannot fully represent certain physical properties of objects. Spectral information can compensate for this limitation. Scholars from various parts of the world have conducted research on whether it is possible to use the straightforward optical structure of single-lens imaging systems in combination with algorithms to acquire spectral information.
In 1995, Lyons, in the United States, proposed a new structure for an imaging spectrometer [
12]. This structure primarily utilizes the dispersive properties of a back-ordered erbium (BOE) element, enabling spectral imaging in the visible and near-infrared wavelength ranges. The BOE element can image at different positions, and a charge-coupled device (CCD) scans along the optical axis to obtain image information in the desired spectral bands. A monochromatic CCD is used in this setup. The image received by the CCD consists of an accurately focused image and overlapping images formed by other wavelengths at different defocusing positions. Post-image processing using computer tomography techniques eliminates unwanted blur components, leaving only the images corresponding to each wavelength. Yubin et al. designed a visual imaging spectrometer experimental setup that utilizes the axial dispersion of binary optical elements [
13]. Its spectral range is from 500 nm to 900 nm, with a spectral resolution of 6.4 nm @ 632.8 nm. The system has an F-number of F/8, a field of view angle of 1.3 degrees, a CCD size of 15 × 15 μm, and a pixel count of 512 × 512. The authors used a three-dimensional optical slice microscopy technique for spectral restoration and proposed three deconvolution algorithms suitable for imaging spectrometers with binary optical elements: the nearest-neighbor method, inverse filtering, and constrained iterative deconvolution [
14]. Oğuzhan Fatih Kar and others introduced a simple and fast computational imaging spectrometer system using a single programmable diffractive lens. They also proposed a rapid spectral restoration algorithm based on an alternating direction multiplier for effectively restoring spectral information under different signal-to-noise ratios [
15].
Image deblurring is the fundamental element of spectral information recovery based on single-lens imaging. In an imaging system, the process of image formation can be described as the convolution of an ideal image with a blur kernel. Therefore, deconvolution, as the inverse process of convolution, theoretically allows for the restoration of a clear image from a blurred one. For deconvolution-based deblurring algorithms, there is close integration of physical considerations and mathematical principles, so that image quality can be improved without altering the physical design of the system or the imaging environment.
Deblurring algorithms can be categorized into non-blind and blind methods based on prior information about the blur kernel. Non-blind methods usually do not consider estimation of the blur kernel, which is typically obtained through direct measurement or simple approximation [
16]. With further research, regularization techniques have been introduced to deblur images. One example of this is the total variation (TV) regularization term [
17], which is designed on the basis of an understanding of image gradient. TV regularization emphasizes the gradient information of an image: when the regularization weight coefficient is large, it achieves better results in recovering texture details, while a smaller weight produces smoother results. Therefore, TV regularization combines both denoising and texture preservation characteristics. Some researchers, such as Chen et al., have used channel correlation properties in multispectral images to guide spectral information recovery. They obtained guiding images for each blurry image, computed their gradients, and used this as prior information for spectral recovery, resulting in high-quality spectral restoration [
18].
The major difference between blind and non-blind methods lies in the blind estimation of blur kernels. Estimating blur kernels is a critical issue in image restoration algorithms, as the accuracy of blur kernel estimation determines the quality of image restoration. Blind deblurring models have more severe drawbacks as their blur kernels are unknown. A representative approach is the variational Bayesian multi-scale blind deconvolution method proposed by Fergus and his team. Initially, they used Bayesian methods to iteratively estimate blur kernels based on a maximum a posteriori (MAP) [
19,
20] model. This iterative process, going from coarse to fine, takes place within a spatial pyramid scale space. Subsequently, the authors reconstructed a clear image using the Richardson–Lucy (RL) method [
21,
22]. However, due to limitations of the standard RL deconvolution algorithm in suppressing ringing artifacts, the image they achieved exhibits noticeable ringing effects, as seen in their paper. Levin and his team proposed an improved variant of blind image restoration based on the effective edge similarity of an image, building upon the method of Fergus and others [
23]. This method also operates within the maximum a posteriori (MAP) solving framework. Its primary contribution lies in the processes of updating and estimating a blur kernel. It not only takes into account the influence of the blurred image itself on the estimation of a blur kernel, but also considers the impact of the covariance of potential clear images on the estimation process. Q. Shan and his colleagues proposed a unified probabilistic model for both blind and non-blind deconvolution, addressing respective maximum a posteriori (MAP) problems through advanced iterative optimization. This optimization process alternates between refining a blur kernel and restoring an image until convergence to a global optimum is achieved. The algorithm can be initialized with a rough estimate of the blur kernel and ultimately yields results that preserve complex image structures while avoiding ringing artifacts [
24]. Krishna proposed an algorithm that uses the ratio of L1 and L2 norms as a regularization constraint [
25], allowing a blurry image to gradually become clear. The specific computational process involves initially placing this constraint on the loss function. Then, it alternates between estimating a clear image and the blur kernel. Ultimately, a more accurate blur kernel is estimated. After obtaining a blur kernel, the author utilized a super-Laplacian prior model for non-blind image deconvolution, resulting in the final restoration of a clear image.
In this study, Baiyang compared non-blind deblurring using TV regularization with blind deblurring based on the MAP framework [
26]. In the visible light wavelength range, when constrained with the TV regularization term utilizing gradient information, it outperformed the MAP method by effectively suppressing ringing artifacts, preserving texture details better, and achieving faster computation.
As the single-lens-based spectral acquisition device in this study only contains a single-lens element, the system’s point spread function (PSF) can be directly measured, making it more suitable for non-blind methods. Building on the previous research and achievements of our research group, the TV regularization term was selected for constrained solving, and this study mainly focuses on optimizing and improving the TV regularization term. The proposed algorithm is based on gradient prior information optimization and enhances similarity to the original image, ultimately improving the quality of spectral restoration. In comparison to unoptimized gradient prior information, the proposed algorithm stands out in terms of restoration quality. The feasibility of this algorithm was validated using publicly available remote sensing datasets and actual captured images.
3. Methods
Image restoration is a deconvolution process. If the point spread function (PSF) of an imaging system is known, an original clear image can be obtained by deconvolution of a blurred image with the PSF [
27]. In practical situations, the influence of noise also needs to be considered. The entire process is shown in
Figure 2.
In the spatial domain, it can be represented as:
In the above equation, g(x,y) represents the blurred image, h(x,y) represents the blur kernel, f(x,y) represents the original image, n(x,y) represents noise, and “*” denotes convolution.
Since digital images are discrete, the above model can be represented as:
If there is no noise, n(x,y) = 0.
The convolution operation in the spatial domain can be transformed into a multiplication operation in the frequency domain; thus:
In the above equation, G(u,v), H(u,v), F(u,v), and N(u,v) correspond to the Fourier transforms of g(x,y), h(x,y), f(x,y), and n(x,y), respectively. When a blurred image and noise are acquired, and prior information about the blur kernel is known, restoration of the image can be achieved through deconvolution, with the aim of recovering an image that closely resembles the original.
Based on the principle of spectrum acquisition using a single-lens imaging system, the spectral information captured using CCD includes both in-focus and out-of-focus images. In other words, the imaging process involves convolving in-focus spectral images with the in-focus PSF and adding them to the convolution of out-of-focus spectral images with the out-of-focus PSF. This imaging process can be challenging to solve as it involves a large number of two-dimensional convolutions. By analyzing the frequency domain, the computational complexity of the convolution can be reduced. If we do not consider the influence of noise, it can be expressed as follows:
F(u,v) represents the Fourier transform of the original image, H(u,v) represents the Fourier transform of the point spread function (PSF), and G(u,v) represents the Fourier transform of the mixed image obtained by CCD acquisition.
Assuming there are
N spectral segments, the image obtained from the
k-th segment can be represented as:
Blurring in the collected images can be caused by the following: (1) spatial blurring caused by the point spread function (PSF) and (2) defocusing from neighboring spectral segments. In practical situations, as the wavelength difference from the k-spectrum segment increases, the impact of defocusing decreases, and it can even be neglected. The real impact comes from adjacent spectral segments. Both spatial blurring and spectral defocusing occur simultaneously.
The key to image restoration lies in the inversion of matrix
H. However,
H usually exhibits ill conditioning. Here, the concept of condition number is introduced, which measures the uncertainty of solution
x with respect to
b, or the sensitivity of the error in equation
Ax =
b. This is expressed as:
If a small perturbation in matrix A causes only a small perturbation in solution vector x, then matrix A is said to be well conditioned. If it causes a large perturbation in x, it is considered to be ill conditioned. It is evident that even a tiny perturbation in matrix H can have a significant impact on the restored image, making matrix H ill conditioned. As a result, it is not directly invertible. In the field of mathematics, inversion processes and deconvolution are both considered inverse problems, which are generally ill posed. In other words, a slight perturbation can lead to a severe deviation in the final solution. To address this issue, more prior information is needed for constraints to be imposed, and appropriate solution methods must be chosen to obtain stable approximate solutions. This approach is known as regularization.
In 1992, Rudin et al. proposed the total variation (TV) regularization model, initially applied to image denoising problems and later widely used in various image restoration tasks. The expression is as follows:
In the above equation, Dx and Dy are first-order gradient operators in the x and y directions, respectively, and i represents the pixel position.
MTV (multi-task total variation) is a multi-channel model that calculates gradients for each pixel individually, expressed as [
28]:
where
k represents the spectral segment. Using MTV, we can restore the aliased spectral imaging of a single-lens imaging system, which can be expressed as:
where
a represents the regularization coefficient, which balances the weights between the first and second terms. To solve the above equation, the alternating direction method of multiplier (ADMM) [
29,
30,
31] can be applied, which usually solves the optimization problem, as follows:
where
x ∈
Rn, z ∈
Rm are two variables that need to be optimized and
A ∈
Rp×n,
B ∈
Rp×m, and
C ∈
Rp×n.
If functions
f(
x) and
g(
z) are convex in the solution of the above expression, variables
x and
z can be separated, which means that the optimization problem can be decomposed into two separate optimization problems for the two variables. The optimization process involves alternating the optimization of these variables until the optimal solution is obtained. The augmented Lagrangian function for the objective function above is formulated by introducing a quadratic penalty term and is given as follows:
where
μ represents the penalty parameter, which takes a positive value, and
y is also a positive parameter, serving as a dual variable. The process of alternating optimization can be described as optimizing
x and
z, and then iteratively updating
y; it can be expressed as follows:
Scaling
y, we define
b =
y/
μ, which results in:
Therefore, the iterative process can be updated as follows:
The above reconstruction model only provides prior information for the spatial dimension and does not consider prior information for the spectral dimension. Therefore, the key to the restoration algorithm is how to introduce effective spectral prior information. The total variation (TV) model mainly deals with gradient information, so spectral gradient information is introduced as prior information. In terms of spatial gradient information, regions with large gradient values represent the edges of an image. However, the total variation constraint is essentially the L1 norm of the image gradient, which may lead to some larger gradient values (such as edge information) not being well preserved, resulting in a certain degree of edge blurring in the reconstructed image.
Therefore, we optimize gradient information by implementing the following improvements:
where
Dz represents the gradient operator for the spectral dimension.
Dx_avg,
Dy_avg, and
Dz_avg represent the average gradients for the spatial and spectral dimensions.
M and
N denote the spatial size of an image, while
S represents the number of spectral bands. By pre-processing and enhancing constraints on the gradient information, the model becomes more closely aligned with the original image’s gradient information.
The resulting restoration model is as follows:
Based on the above, the spectral restoration model can be finally represented as:
According to the aforementioned alternating direction method of multipliers (ADMM) for solving the problem, the model has non-differentiability. Therefore, the solution is decomposed into multiple sub-problems by introducing intermediate variables
wi =
Pi f,
i = 1,2…
n2, Therefore, the problem is transformed into:
The construction of the augmented Lagrangian function
L(
f,
wi,
μi) for the above function is as follows:
Scaling parameter
μ, we obtain:
In the above equation, bi = μi/β represents the Lagrange multiplier. For each iteration, only one variable is optimized while fixing all other variables, and the iterative process alternates to update each variable to be solved.
To verify the feasibility of the proposed algorithm in this study, the image restoration quality was evaluated using root-mean-square error (RMSE), peak signal-to-noise ratio (PSNR), and the structural similarity index (SSIM) [
32].
Let y be the restored image and y′ be the original image, both with size M × N.
RMSE is used to calculate the deviation between the restored image and the original image by first computing the mean squared error (MSE) and then taking its square root. A lower RMSE indicates a better restoration result. The formula for calculating RMSE is as follows:
PSNR is an objective criterion for evaluating images, where a higher value indicates a better restoration result. The formula for calculating PSNR is:
SSIM utilizes the structural relationship of images to evaluate their similarity at a deeper level. In practical applications, it applies to the mean and variance information of an image matrix, using the mean to describe luminance information and the variance to describe contrast information. Finally, it uses covariance between the matrices to represent the similarity. The expression for SSIM is:
In the above equation, A and B are constants related to the pixel range of the image, where A = (0.01L)2 and B = (0.03L)2, with L being the maximum pixel value (e.g., 255 for 8-bit images). μr and μo represent the mean values of the restored image and the original image, respectively. σr and σo represent the standard deviations of the restored image and the original image, respectively, while σro represents the covariance between the restored image and the original image.
For evaluating spectral restoration quality, the spectral correlation coefficient (
SCC) and spectral mean square error (
SMSE) were used [
33,
34].
Let u represent the original spectral data and v represent the reconstructed spectral data, with n denoting the data size. Descriptions of the various evaluation methods are as follows:
The formula for calculating
SCC is:
and represent the mean values of the original spectral data and the reconstructed spectral data, respectively. The SCC (spectral correlation coefficient) takes values between −1 and 1, where a larger SCC value indicates higher spectral similarity.
The formula for calculating
SMSE is:
5. Experiment
According to the imaging principle of a single-lens system, the object distance is generally required to be greater than twice the focal length of the system. This way, the obtained image is a reduced real image. In order to reduce the length of the imaging system, in this study, we chose the following lens parameters: diameter of 30 mm, focal length of 60 mm @ 587.6 nm, and N-BK7 material. The detector parameters were 1024 × 1024 pixels with a sensor size of 12.7 mm × 12.7 mm.
The PSF measurement system was set up as shown in
Figure 7. In the experiment, a monochromator was used to illuminate a pinhole, and a CCD was used to capture an image of the pinhole. By moving the CCD, the pinhole’s clearest image was obtained, which represents the in-focus PSF for that spectral band. Then, by adjusting the wavelength, PSFs at different degrees of defocus for other spectral bands relative to this band were obtained. Through this method, in-focus and defocused PSFs for different spectral bands were obtained.
As shown in
Figure 8a, the setup for the PSF measurement experiment consisted of a pinhole with a diameter of 0.1 mm. The CCD exposure time for all measurements was set to 16.69 ms, as shown in
Figure 8b.
In this study, PSFs for the spectral range of 0.520 μm to 0.590 μm were measured with a sampling interval of 0.010 μm, as shown in
Figure 9. The experimental setup is shown in
Figure 10. A monochromator was used as a single-wavelength light source, and an LED (light-emitting diode) was used as a polychromatic light source. The object distance was set to 42.5 cm. When using the polychromatic light source, the center brightness of the LED was considered to be too high, resulting in an overexposed center region and relatively dark image edges. To address this, a mirror was used to reflect the light, making the entire light source more uniform. The object captured in the images is the uppercase letter “E”.
Figure 11 shows the experimental restoration results for the spectral range from 0.520 μm to 0.590 μm, where (a) to (h) represent the experimental results for different wavelengths. From left to right are the acquired original image, the captured image, the MTV-restored image, and the image restored using the algorithm proposed in this study.
An evaluation of these results using quality assessment metrics is shown in
Table 3.
Taking the original image at a wavelength of 0.520 μm as an example, points A and B were selected, as shown in
Figure 12. Spectral restoration was performed on points A and B, and the results are shown in
Figure 13.
From a comparison of the restoration results, it can be observed that the spectral restoration quality of the algorithm proposed in this study is superior to that of the MTV algorithm. For point A, the restored spectrum using the algorithm in this study and the MTV-restored spectrum both exhibit the same trend as the original spectrum, but the similarity of the algorithm in this study is higher. For point B, the MTV algorithm’s restored spectrum is completely distorted, while the restored spectrum using the algorithm proposed in this study shows a high similarity to the original spectrum.
An evaluation of the quality of the restored spectral data is shown in
Table 4.
Based on the above table, it can be observed that for both spectral similarity and spectral root-mean-square error, the restoration quality of the algorithm proposed in this study is higher than that of the MTV algorithm for points A and B.