Single-Pixel Imaging Based on Enhanced Multi-Network Prior

Feng, Jia; Li, Qianxi; Dong, Jiawei; Zhao, Qing; Wang, Hao

doi:10.3390/app15147717

Open AccessArticle

Single-Pixel Imaging Based on Enhanced Multi-Network Prior

by

Jia Feng

^1,2

,

Qianxi Li

^1,2,

Jiawei Dong

^1,2,

Qing Zhao

¹ and

Hao Wang

^1,*

¹

Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 7717; https://doi.org/10.3390/app15147717

Submission received: 13 May 2025 / Revised: 23 June 2025 / Accepted: 8 July 2025 / Published: 9 July 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Single-pixel imaging (SPI) is a significant branch of computational imaging. Owing to the high sensitivity, low cost, and wide spectrum, it acquires extensive applications across various domains. Nevertheless, multiple measurements and long reconstruction time constrain its application. The application of neural networks has significantly improved the quality of reconstruction, but there is still a huge space for improvement in performance. SAE and Unet have different advantages in the field of SPI. However, there is no method that combines the advantages of these two networks for SPI reconstruction. Therefore, we propose the EMNP-SPI method for SPI reconstruction using SAE and Unet networks. The SAE makes use of the measurement dimension information and uses the group inverse to obtain the decoding matrix to enhance its generalization. The Unet uses different size convolution kernels and attention mechanisms to enhance feature extraction capabilities. Simulations and experiments confirm that our proposed enhanced multi-network prior method can significantly improve the quality of image reconstruction at low measurement rates.

Keywords:

single-pixel imaging; multi-network prior; SAE; Unet

1. Introduction

Single-pixel imaging (SPI) is also known as computational ghost imaging. Different from the conventional imaging methods that rely on detector pixel arrays, it employs point detectors lacking spatial resolution to capture intensity data. Subsequently, the measured light intensity values are combined with a corresponding measurement matrix to reconstruct the image. Owing to its high sensitivity, wide spectrum, and low cost, SPI has found successful applications in numerous fields, particularly in scenarios where conventional imaging techniques are either prohibitively expensive or technologically unfeasible [1,2,3,4]. Typically, it could be utilized in infrared imaging [5], 3D imaging [6], X-ray imaging [7], remote sensing [8], hyperspectral imaging [9], terahertz imaging [10] and super-resolution imaging [11].

Reconstructing high-quality images with few measurements and short reconstruction time has always been the primary goal of SPI. Limited by hardware facilities, the fastest refresh rate of DMD currently used is only 22 kHz. There are two primary strategies to achieve faster single-pixel imaging. The first approach is to boost the modulation speed by designing LED arrays and rotating masks, which operates independently of the speed of DMD [12,13,14]. The second approach entails employing advanced algorithms to decrease the number of required measurements [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. In this work, we mainly focus on the advanced algorithms.

With the rapid advancement of deep learning and neural networks, an increasing number of methods have adopted neural networks to facilitate fast imaging and high-quality image reconstruction. Currently, the mainstream methods in this field can be divided into three categories, which will be described in detail below.

The first approach involves end-to-end joint learning of both sampling and reconstruction processes using neural networks [15,16,17,18,19,20,21,22,23,24,25,26,27,28]. In this framework, the trained neural network weights serve as the measurement matrix, typically demonstrating superior performance in terms of both reconstruction speed and quality.

Wang Z. et al. [19] proposed an FSPI under-sampling optimization method based on generative adversarial networks and attention mechanisms. Zhang X. et al. [20] proposed a variable generative network enhanced SPI algorithm VGNet. Dai Q. et al. [21] proposed a generative adversarial network single-pixel imaging algorithm MAID-GAN and MAID-GAIN+ that uses measurement values as auxiliary inputs. Lim J. Y. et al. [22] proposed an enhanced SPI reconstruction algorithm using transformer network with adaptive feature refinement. Woo. B. H. et al. [23] proposed an adaptive coarse-to-fine sampling method, which samples step by step according to the image and quality indicators, and finally uses GAN for depth image reconstruction. Song X. et al. [24] proposed a new Fourier SPI imaging high-quality reconstruction method based on diffusion model. Huang C. et al. [25] proposed a diffusion model single pixel imaging algorithm DGRN with gradient descent guidance. Dong J. et al. [26] proposed a DSPIM method based on conditional diffusion model and autoencoder network joint training and reconstruction for SPI. Geng Z. et al. [27] proposed a single-pixel computational imaging method based on a multi-input mutual supervision network. The above is the research results of the end-to-end generated model. Nevertheless, the number of network parameters grows exponentially with the size of the reconstructed image, posing scalability challenges. Neural networks have proven particularly effective in optimizing the simulation process of single-pixel reconstruction. Experimental results indicate a significant reduction in system complexity at low measurement rates (MR).

The second approach is an untrained reconstruction network that does not need to be trained on the dataset. It is an enhanced deep learning method based on physical model, which builds a bridge between data-driven and model-driven algorithms to improve the reconstruction quality of single-pixel imaging [28,29,30,31,32,33].

Wang Q. et al. [30] proposed an SPI method based on Fourier transform using untrained convolutional network. Although there is no need to train the dataset, it is necessary to use the output and label data of the neural network and the physical model to automatically optimize the network parameters. The imaging effect is better than other algorithms when the sampling rate is lower than 12.5%. Li Z. et al. [31] proposed an untrained convolutional autoencoder network for SPI reconstruction. Although this method does not require pre-training on the dataset, it needs to automatically optimize the network parameters through the interaction between the neural network and the physical model. Wang F. et al. [32] proposed a far-field super-resolution GI technique that incorporates the physical model for GI image formation into a deep neural network. The hybrid neural network incorporated into the physical model does not require pre-training on any dataset and can reconstruct far-field images with resolutions exceeding the diffraction limit. Lei et al. [33] proposed an untrained attention U-Net for SPI, achieving superior reconstruction quality at sampling rates below 10%. The combination of physical models significantly improves the reconstruction quality, but this method is highly dependent on the accuracy of the physical model itself.

The third method uses a traditional or enhanced measurement matrix to linearly measure the original image [34,35,36,37,38,39,40,41,42] and uses a neural network to reconstruct the target image within this linear framework. When the measurement rate is low, the results of traditional algorithms are usually affected by background noise or blurred lines, resulting in poor reconstruction results. Using neural networks improves the quality of reconstruction.

Hu J. et al. [39] proposed an enhanced encoding image similarity method for one-dimension signal in SPI. Zhao H. et al. [40] proposed an OIAE algorithm based on AE network to extract features for reconstruction. Zeng H. et al. [41] proposed a reconstruction method of SAE and multi-channel prior information based on SAE extracting prior information of different sample rates. However, SAE uses a simple double full connection, which limits the feature extraction ability. Feng W. et al. [42] proposed a Unet++ SPI reconstruction method based on Unet++ and attention mechanism, achieving clear image reconstruction under the turbidity level of 80 NTU. Although single-pixel imaging based on SAE or Unet network has achieved good results, there is no multi-network single-pixel imaging method combining SAE and Unet. Therefore, we propose a multi-network single-pixel imaging method based on SAE and Unet.

In order to make full use of the dimension information of the measurement value, the number of SAE coding network layers is designed to be four layers. In order to enhance its generalization, the group inverse is used to solve the decoding matrix of SAE.

The Unet architecture employs a three-layer network structure, incorporating multi-scale convolutional kernels, sub-pixel convolution, and an attention mechanism to enhance its feature extraction ability.

The reconstructed image is obtained by using the SAE network and Unet to extract feature fusion. In this way, the advantages of both networks are combined to obtain crucial image information and improve the image reconstruction quality. Moreover, our proposed network can be applied to SPI systems by training the sampling matrix in binary form. The effectiveness of the proposed network is verified by simulation and physical experiments.

Our contributions are summarized as follows:

We propose an enhanced multi-network prior reconstruction method for SPI using SAE and Unet.
The SAE network uses a four-layer network, which uses the dimension information of the measurement value, and the encoding matrix is obtained based on the training of the dataset, and the decoding matrix is obtained by using the group inverse instead of obtaining it through training.
The Unet network uses three different sizes of convolutional kernels to obtain target features of different sizes and uses sub-pixel convolution and attention mechanism to improve the reconstruction quality.
We apply the proposed methods in a single-pixel imaging system and verify the effectiveness and feasibility of the proposed methods.

2. Single-Pixel Imaging System

The single-pixel imaging system is shown in Figure 1, which consists of an illumination LED, a collimating pipe, a digital micromirror device (DMD), an FPGA control circuit board, a PMT, and a computer. The pipe contains frosted glass, a reticle, a convex lens, an attenuator, and an aperture. The light emitted by the LED is uniform after passing through frosted glass and illuminates the reticle placed on the focal plane of the convex lens. The light emitted from each point on the reticle becomes parallel light after passing through the convex lens and forms a weak single-photon light source after passing through the attenuator. The imaging object is a projected pattern etched on glass. When illuminated by the system’s light source, the imaging object is projected onto the digital micromirror device (DMD) through the lens. The DMD (TI 07XGA DMD USA) is composed of 1024 × 786 individual micromirrors, each of which is independently controllable to achieve light modulation. The DMD loads the measurement matrix to facilitate the modulation of spatial light. Each micromirror has a size of 13.68 μm × 13.68 μm and has two reflection states, +12° and −12°, which are “1” and “0”, respectively. Micromirrors in state “1” reflect the optical signal into the collection optical path, while the micromirrors in state “0” reflect the light out of the optical path. The two-dimensional measurement matrix loaded on the DMD controls the state of each micromirror to achieve the modulation of the input light. The modulated optical signal is received by a PMT and operated in photon counting mode. In this photon counting mode, the PMT (Hamamatsu Photonics H10682-110 PMT Japan) acts as a point detector, allowing the simultaneous collection of light intensity values from multiple micromirrors on the DMD in a single acquisition and then outputs discrete pulses to the FPGA. A specifically designed FPGA control and counting circuit helps to load the binary measurement matrix into the DMD controller and calculates the single-photon pulse output of the PMT for each measurement.

3. Multi-Network Prior Reconstruction Method

3.1. Compressed Sensing

Compressed sensing, also known as compressive sampling or sparse sampling, is a method of finding sparse solutions for underdetermined systems. The image reconstruction process for solving the uncertain equation y = Ax is based on the measurement matrix A and known measurement values y and is widely used in single-pixel imaging technology. We define prior information as prior (x), and the reconstructed model can be expressed as:

\underset{x}{Min} {‖A x - y‖}^{2} + L * p r i o r (x),

(1)

where

x \in R^{m n}

is the reconstructed image,

A \in R^{r \times m n}

represents a partially sampled random matrix,

y \in R^{r}

represents the original data obtained in the single photon imaging system, and L is a hyperparameter that balances the effects of image fidelity and prior constraints.

3.2. Sparse Autoencoder Network

The decision to choose sparse autoencoder as a prior constraint for image reconstruction comes from previous works [43,44,45], where Alain et al. proposed sparse autoencoder. A connection has been established between the output Dx(u) of the sparse autoencoder network and the true data density p(u), as described by the following Equation (2).

D_{x} (u) = \frac{\int (u - τ) g_{σ τ} (τ) p (u - τ) d τ}{\int g_{σ τ} (τ) p (u - τ) d τ},

(2)

In the equation above,

g_{σ τ} (τ)

is a Gaussian kernel with a standard deviation of

σ τ

. This kernel is a smooth function characterized by rotational symmetry and translational invariance.

It can be seen from Equation (2) that the network output Dx(u) is the weighted average of the images in the input neighborhood. In other words, the neural network can generate a reconstructed image that is very similar to the original image by considering the probability of pixel values in the input image region and the noise in the image. This means that there is a smooth and weighted relationship between the output image of the neural network and the regional pixels of the original image, rather than a simple one-to-one correspondence between the pixels. This equation explains the basic principle of neural network image reconstruction mathematically and clarifies the precondition of neural network as prior information. As prior neural network, it can adjust the relationship between regional pixels.

In addition, as the neural network loses D(x) − x iterations, D(x) gradually aligns with the input image x. The variation of D(x) can be obtained by taking the differential on both sides of Equation (2), thus obtaining Equation (3). This equation indicates that the error D(x) − x of the autoencoder is proportional to the logarithm of the smooth gradient.

D (x) - x = σ_{τ}^{2} \nabla l o g [g_{σ_{τ}} * p] (x),

(3)

When the activation rate of the intermediate hidden layer in the sparse autoencoder is limited, it increases the compression degree of the image, improves the modeling of local rules, and eliminates some noise in the image. The parameters of sparse autoencoders are reduced, and the resources consumed are also reduced, making it easier to train. In addition, the feature limit learned through sparse autoencoders is interpretable and more suitable for processing image noise.

The structure of the SAE network is shown in Figure 2. The SAE network uses a four-layer network with two encoding and two decoding layers. In order to make full use of the information of the measurement dimension, the first layer of the encoding network encodes the input dimension into the initial dimension minus the dimension of the measurement value, and the second layer of the encoding network encodes the output of the first encoding layer dimension into the dimension of the measurement value. In order to enhance the generalization of the SAE network, the two-layer coding matrix is multiplied as the network coding matrix A, and the network decoding matrix B is obtained by solving the group inverse of the network encoding matrix.

3.3. Unet Network

The structure of Unet is shown in Figure 3. In order to improve the feature extraction ability of the network, the network is designed in three aspects: (1) In coding, three down-sampling branches perform multi-feature extraction on images from two aspects: different feature map scales and receptive sizes of different convolution kernels. Each down-sampling branch contains three down-sampling residual blocks. Although these three branches have the same structure, the convolution kernel sizes in the network are 3, 5 and 7, respectively, and the weights between them are not shared. (2) In decoding, it is composed of three up-sampling sub-pixel blocks, and sub-pixel interpolation with a magnification of 2 is used instead of deconvolution to improve resolution. Sub-pixel interpolation combines the pixel values of the corresponding positions of the adjacent four channels into a 2 × 2 pixel block in order. (3) Residual network and attention mechanism are also used to improve the reconstruction efficiency. The residual structure usually adds the shallowest information of the network directly to the deepest layer of the network. The excessive information difference between the two may increase the training difficulty of the network. Therefore, the shallow layer of the network is fused with the middle layer of the network, and the fused information is sent to the next up module. Each layer of the up module contains sub-pixel convolution, Mish activation function and attention mechanism. The role of sub-pixel convolution is to improve image resolution and avoid image blurring caused by traditional interpolation methods. The role of the attention mechanism focuses on key information, dynamically allocates weights, and enhances the interpretability of the model. The role of the residual network is to accelerate the convergence speed and avoid gradient disappearance or gradient explosion.

3.4. Single-Pixel Imaging Based on EMNP

There are some limitations in extracting prior information from single-channel networks. SAE and Unet have different advantages in single-pixel imaging reconstruction. SAE has a strong ability to extract global information, and Unet has a strong ability to extract detailed information. Therefore, multi-network single-pixel reconstruction selects SAE for preliminary reconstruction and Unet for deep reconstruction. The output of SAE is concatenated with the output of Unet, and the result of the concatenation is output after fusion. The training process will adjust the fusion weights of the two networks. The fusion of the two networks uses an adaptive regularization coefficient adjustment method (ARCA) [26]. The EMNP-SPI reconstruction block diagram is shown in Figure 4.

The loss function uses MSE loss. During training, we minimize the output of the multi-network, D(x), to satisfy Equation (4).

{L o s s}_{M S E} = E_{x} [{‖D (x) - x‖}^{2}]

(4)

The total loss refers to the loss between the final output and the target, the SAE loss refers to the loss between the SAE network output and the target, and the Unet loss refers to the loss between the Unet network output and the target. In order to give full play to the advantages of different networks, the loss function uses the sum of total loss, SAE loss and Unet loss. The loss function of the two networks has an adjustable coefficient, as shown in Equation (5).

{L o s s}_{f i n a l} = {L o s s}_{t o a l} + a {L o s s}_{S A E} + b {L o s s}_{U n e t}

(5)

There are a variety of solving methods, such as the augmented Lagrange multiplier (ALM), the alternating direction augmented Lagrange multiplier method (ADMM), and the gradient descent method [46,47,48]. Among them, the solving of augmented ALM and ADMM is decomposed into minimum value solving problems without penalty terms. The number of iterations is very large, resulting in the reconstruction time showing an exponential growth trend with the size of the image. However, the reconstruction time of the gradient descent method is much less than that of the previous two, so we finally choose the gradient descent method to solve the above formula. The solving algorithm steps are shown in Algorithm 1.

Algorithm 1. The algorithm steps for solving with the gradient method

1: Initialization:

x^{0}

2: for

k = 1,2, . . ., K

do
3:

\nabla f (x^{k}) \approx \frac{f (x^{k} + a) - f (x^{k} - a)}{2 h}

4: while the following equation is satisfied to stop iterating:
5:

\nabla f (x^{k}) < l o s s_m a x

6: end(while)
7: update

x^{k + 1} = x^{k} - \nabla f (x) \times l e a n i n g_r a t e

8: end(for)
9: Output:

x^{k + 1}

The key to the gradient descent method is the accuracy of gradient solving. There are two traditional methods for calculating gradients: the analytical method and the numerical method. The method is to find the expression of the gradient function and find its limit value. However, it is difficult to find the expression of most functions, and it is easy to make mistakes. Considering the accuracy of gradient calculation, we choose the numerical method, which calculates the gradient by estimating the derivative through the calculation of the infinitesimal change of the variable.

In the research process, it is necessary to use the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) of the reconstructed image and the original image to analyze and discuss the experimental results. The calculation formula is as follows:

P S N R = 20 {l o g}_{10} \frac{M a x (\hat{x})}{{‖x - \hat{x}‖}_{2}}

(6)

In Equation (6), x is the original image and

\hat{x}

is the reconstructed image.

S S I M = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}

(7)

In Equation (7),

μ_{x}

is the mean value of x,

μ_{y}

is the mean value of y,

σ_{x}

is the variance of x,

σ_{y}

is the variance of y,

σ_{x y}

is the covariance of x and y.

4. Result and Discussion

In order to verify the effectiveness of EMNP-SPI, we conducted both simulation and practical experiments. The generalization of the proposed method cannot be verified using only one dataset. In this section, we train the EMNP with the flower dataset and natural image dataset (set91 and set5), respectively. Since our method aims to explore the reconstruction accuracy of the image pixel intensity, in order to evaluate the experimental results more intuitively and also to facilitate the comparison with existing methods, we preprocess the images in these datasets, converting the color images into grayscale images for training and testing. In addition, we also performed an ablation experiment, using different structures to reconstruct images at the same MR, proving the necessity of the combination of SAE and Unet.

4.1. Select the Optimal Hyperparameters of EMNP-SPI

This hyperparameter is determined by simulation experiments. We perform experiments on parameters ranging from 0.1 to 1 using the flower dataset. Since we are mainly concerned with the image reconstruction quality at low measurement rate (MR), the MR is set to 0.05 in the simulation experiments. Firstly, we set a = 1, with b ranging from 0.1 to 1. The PSNR of the reconstructed image is shown in Table 1. It can be seen from Table 1 that when b is 0.3, the PSNR of the reconstructed image is highest. The optimal value of b is 0.3. Then we set b = 0.3, with a ranging from 0.1 to 1. The PSNR of the reconstructed image is shown in Table 2. It can be seen from Table 2 that when b is 0.3, the PSNR of the reconstructed image is highest. Through simulation experiments, it can be confirmed that the optimal values of the hyperparameters a and b of the EMNP-SPI method are 0.3.

4.2. Simulation Experiment Results on Flower Dataset

We train with the flower dataset and test with two specified images “光” and “UCAS”. The loss function has two weight parameters to adjust the loss function of SAE and Unet. The weight of the SAE loss function and the weight of the Unet loss function are set to 0.3. The reconstructed images were obtained at the measurement rates of 0.01, 0.05, 0.1 and 0.2, respectively. We compare the proposed method with traditional methods of TVAL3, one-norm prior and DDPM. The Chinese character “光” reconstruction result is shown in Figure 5, and the letter “UCAS” reconstruction result is shown in Figure 6. It can be seen that our method is superior to TVAL3, one-norm prior and DDPM in terms of clarity and brightness of reconstructed images. The PSNR and SSIM of the reconstructed images by different methods are shown in Table 3. The PSNR of the proposed method’s reconstructed image is better than those of TVAL3, one-norm prior and DDPM.

4.3. Simulation Experiment Results on Natural Image Dataset

Ninety-one images of set91 are divided into 64 × 64 patches with a stride of 16. Ten thousand of them are used as the training dataset and 500 are used as the test dataset. After training based on the natural image dataset, set5 is used for testing. The five images of the set5 dataset are divided into 64 × 64 block images according to the step size of 64. If the image length or width can be divided by 64, there is no overlapping area in the image. If the image length or width cannot be divided by 64, there will be overlapping areas in the image. The test image is reconstructed and spliced into an image of the original image size.

In this section, we verify the reconstruction performance of EMNP-SPI on the gray natural image dataset. Table 4 shows the PSNR and SSIM of EMNP-SPI on the set5 dataset. The reconstruction images at MR = 5% are shown in Figure 7, and the comparative simulation results of EMNP-SPI and other reconstruction methods are shown in Figure 8. It can be seen from Figure 7 that a good quality image was reconstructed at a 5% measurement rate.

It can be seen from Figure 8 that EMNP-SPI can reconstruct clear images at MR = 5%, and the reconstruction quality is better than that of other algorithms. TVAL3 can reconstruct a clear image at MR = 8%, the reconstructed image of SAE has partially missing details at MR = 10%, and SPI-GAN has obvious blocks in the reconstructed image with low measurement rate.

We compare the proposed algorithm with TVAL3, SAE, and SPI-GAN. It can be seen from Figure 9 that the EMNP-SPI method is superior to TVAL3 at low MR in PSNR and SSIM. At MR = 5%, PSNR is 4.98 dB higher, and SSIM is 0.11 higher than TVAL3.

4.4. Results of Ablation Experiment

In the method proposed in this paper, we use a SAE and Unet combination for the reconstruction process. To prove the effectiveness and necessity of the algorithm above, we conducted an ablation experiment using SAE, Unet and EMNP model under the same parameters at MR = 0.01, 0.05, 0.08 and 0.1. The result is shown in Figure 10. It is evident that the combination of SAE and Unet significantly enhances the reconstruction quality compared to a single network.

4.5. Results of EMNP on SPI System

We have built a SPI system in the laboratory. In order to meet the requirements of hardware, we developed a binary version of EMNP-SPI. The overall architecture of the network remains unchanged during the training of the binary sampling matrix. Since the DMD can only input 0 and 1, the binary sampling matrix is converted into a measurement matrix that can be used by the DMD. The specific operation is to convert 1 and −1 to 0, respectively, and the measurement value Y is obtained by subtracting the measured values of the two measurement matrices.

We image the target letters “N” and “U” on the resolution board at measurement rates of 0.01, 0.05 and 0.098. The imaging resolution is 64 × 64. The reconstruction result of TVAL3 and EMNP-SPI is shown in Figure 11. We can find that the image reconstructed by EMNP-SPI is clearer. The image quality is better than that of TVAL3, which is consistent with the simulation results.

5. Conclusions

In this work, we propose an enhanced multi-network prior method for single-pixel imaging reconstruction, which utilizes the advantages of SAE and Unet networks to extract different prior information. The SAE network is for primary reconstruction, and the Unet network is for deep reconstruction. In order to make full use of the measurement value dimension information, the SAE network is designed as four layers. In order to enhance the generalization of the SAE network, the decoding matrix is obtained by solving the group inverse of the encoding matrix. The Unet network uses three different sizes of convolution kernels, residual networks and attention mechanisms to improve feature extraction capabilities. Finally, the output of the two networks is fused, and the fusion training process automatically adjusts the weight. Through simulation analysis and experimental verification, the reconstruction effect of our proposed method is better than that of TVAL3. The proposed method is suitable for a variety of scenarios and has development prospects.

This method can reconstruct the image at a 0.05 measurement rate, but the reconstruction quality is worse than that of DGRN, MAID-GAN, and other generation models at lower measurement rates. The reconstruction result of DGRN, MAID-GAN and EMNP-SPI is shown in Figure 12. It can be seen from the figure that DGRN and MAID-GAN reconstruction image clarity and PSNR are better than our proposed method.

The reason for this result is that the generated model has strong feature extraction ability. The generated model can reconstruct a clear image based on a few measured values. But the image reconstructed by the generated model is different from the original image, which involves a trade-off between clarity and authenticity. In very low measurement rate reconstruction, the generated model selects clarity prior to authenticity, and our method selects authenticity prior to clarity. Our method needs to be further optimized to enhance the feature extraction ability. This method improves the performance of traditional methods at low measurement rates. In the future, we will continue to study in this direction.

Author Contributions

Conceptualization, J.F. and Q.L.; funding acquisition, H.W.; methodology, J.F. and J.D.; investigation, J.F. and Q.Z.; software, J.D.; visualization, J.F. and Q.Z.; writing—original draft preparation, J.F.; writing—review and editing, J.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Basic Research Plan in Shaanxi Province of China, grant number 2023-JQ-QC-0714.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Acknowledgments

Jie Yu has made contributions to the visualization of natural datasets and the splicing of small images into large ones.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Duarte, M.F.; Davenport, M.A.; Takhar, D.; Laska, J.N.; Sun, T.; Kelly, K.F.; Baraniuk, R.G. Single-pixel imaging via compressive sampling. IEEE Signal Process. Mag. 2008, 25, 83–91. [Google Scholar] [CrossRef]
Yu, W.; Yao, X.; Liu, X.; Zhai, G.; Zhao, Q. Compressed sensing for ultra-weak light counting imaging. Opt. Precis. Eng. 2012, 20, 2283–2292. [Google Scholar]
Jiao, S.; Feng, J.; Gao, Y.; Lei, T.; Xie, Z.; Yuan, X. Optical machine learning with incoherent light and a single-pixel detector. Opt. Lett. 2019, 44, 5186–5189. [Google Scholar] [CrossRef]
Bai, L.; Liang, Z.; Xu, Z. Study of single pixel imaging system based on compressive sensing. Comput. Eng. Appl. 2011, 47, 116–119. [Google Scholar]
Edgar, M.P.; Gibson, G.M.; Bowman, R.W.; Sun, B.; Radwell, N.; Mitchell, K.J.; SWelsh, S.; Padgett, M.J. Simultaneous real-time visible and infrared video with single-pixel detectors. Sci. Rep. 2015, 5, 10669. [Google Scholar] [CrossRef]
Sun, B.; Edgar, M.P.; Bowman, R.; Vittert, L.E.; Welsh, S.; Bowman, A.; Padgett, M.J. 3D computational imaging with single-pixel detectors. Science 2013, 340, 844–847. [Google Scholar] [CrossRef]
Zhang, A.; He, Y.; Wu, L.; Chen, L.; Wang, B. Tabletop X-ray ghost imaging with ultra-low radiation. Optica 2018, 5, 374–377. [Google Scholar] [CrossRef]
Gong, W.; Zhao, C.; Yu, H.; Chen, M.; Xu, W.; Han, S. Three-dimensional ghost imaging lidar via sparsity constraint. Sci. Rep. 2016, 6, 26133. [Google Scholar] [CrossRef] [PubMed]
Studer, V.; Bobin, J.; Chahid, M.; Mousavi, H.S.; Candes, E.; Dahan, M. Compressive fluorescence microscopy for biological and hyperspectral imaging. Proc. Natl. Acad. Sci. USA 2012, 109, E1679–E1687. [Google Scholar] [CrossRef]
Chen, S.; Feng, Z.; Li, J.; Tan, W.; Du, L.; Cai, J.; Ma, Y.; He, K.; Ding, H.; Zhai, Z.; et al. Ghost spintronic THz-emitter-array microscope. Light Sci. Appl. 2020, 9, 99. [Google Scholar] [CrossRef]
Li, W.; Qi, J.; Alu, A. Single-pixel super-resolution with a space-time modulated computational metasurface imager. Photonics Res. 2024, 12, 2311–2322. [Google Scholar] [CrossRef]
Xu, Z.; Chen, W.; Penuelas, J.; Padgett, M.; Sun, M. 1000 FPS computational ghost imaging using LED-based structured illumination. Opt. Express 2018, 26, 2427–2434. [Google Scholar] [CrossRef] [PubMed]
Hahamovich, E.; Monin, S.; Hazan, Y.; Rosenthal, A. Single pixel imaging at megahertz switching rates via cyclic Hadamard masks. Nat. Commun. 2021, 12, 4516. [Google Scholar] [CrossRef]
Zhang, Y.; Li, M.; Zhao, Z.; Liu, X.; Lian, W.; Quan, B.; Wu, L. Robust real-time single-pixel imaging based on a spinning mask via differential detection. Opt. Express 2024, 32, 47216–47224. [Google Scholar] [CrossRef]
Xie, X.; Wang, Y.; Shi, G.; Wang, C.; Du, J.; Han, X. Adaptive Measurement Network for CS Image Reconstruction. Comput. Vis. 2017, 772, 407–417. [Google Scholar]
Mousavi, A.; Patel, A.B.; Baraniuk, R.G. A deep learning approach to structured signal recovery. In Proceedings of the 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 30 September–2 October 2015. [Google Scholar]
Mousavi, A.; Baraniuk, R.G. Learning to invert: Signal recovery via Deep Convolutional Networks. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017. [Google Scholar]
Liu, Z.; Zhang, H.; Zhou, M.; Jiao, S.; Zhang, X.; Geng, Z. Adaptive Super-Resolution Networks for Single-Pixel Imaging at Ultra-Low Sampling Rates. IEEE Access 2024, 12, 78496–78504. [Google Scholar] [CrossRef]
Wang, Z.; Wen, Y.; Ma, Y.; Peng, W.; Lu, Y. Optimizing Under-Sampling in Fourier Single-Pixel imaging using GANs and attention mechanisms. Opt. Laser Technol. 2025, 187, 112752. [Google Scholar] [CrossRef]
Zhang, X.; Deng, C.; Wang, C.; Wang, F.; Situ, G. VGenNet: Variable Generative Prior Enhanced Single Pixel Imaging. ACS Photonics 2023, 10, 2363–2373. [Google Scholar] [CrossRef]
Dai, Q.; Yan, Q.; Zou, Q.; Li, Y.; Yan, J. Generative adversarial network with the discriminator using measurements as an auxiliary input for single-pixel imaging. Opt. Commun. 2024, 560, 130485. [Google Scholar] [CrossRef]
Lim, J.Y.; Chiew, Y.H.; Phan, R.; Chong, E.; Wang, X. Enhancing single-pixel imaging reconstruction using hybrid transformer network with adaptive feature refinement. Opt. Express 2024, 32, 32370–32386. [Google Scholar] [CrossRef]
Woo, B.H.; Tham, M.L.; Chua, S.Y. Adaptive Coarse-to-Fine Single Pixel Imaging with Generative Adversarial Network Based Reconstruction. IEEE Access 2023, 11, 31024–31035. [Google Scholar] [CrossRef]
Song, X.; Liu, X.; Luo, Z.; Dong, J.; Zhong, W.; Wang, G.; He, b.; Li, Z.; Liu, Q. High-resolution iterative reconstruction at extremely low sampling rate for Fourier single-pixel imaging via diffusion model. Opt. Express 2024, 32, 3138–3156. [Google Scholar] [CrossRef]
Huang, C.; Yan, Q.; Yan, J.; Li, Y.; Luo, X.; Wang, H. Diffusion Model with Gradient Descent Module Guiding Reconstruction for Single-Pixel Imaging. IEEE Photonics J. 2024, 16, 1–10. [Google Scholar] [CrossRef]
Dong, J.; Zeng, H.; Dong, S.; Chen, W.; Li, Q.; Cao, J.; Yan, Q.; Wang, H. Enhanced Single Pixel Imaging by Using Adaptive Jointly Optimized Conditional Diffusion. IEEE Trans. Comput. Imaging 2025, 11, 289–304. [Google Scholar] [CrossRef]
Geng, Z.; Sun, Z.; Chen, Y.; Lu, X.; Tian, T.; Cheng, G.; Li, X. Multi-input mutual supervision network for single-pixel computational imaging. Opt. Express 2024, 32, 13224–13234. [Google Scholar] [CrossRef] [PubMed]
Wang, F.; Wang, C.; Deng, C.; Han, S.; Situ, G. Single-pixel imaging using physics enhanced deep learning. Photonics Res. 2022, 10, 104–110. [Google Scholar] [CrossRef]
Bian, Y.; Wang, F.; Wang, Y.; Fu, Z.; Liu, H.; Yuan, H.; Situ, G. Passive imaging through dense scattering media. Photonics Res. 2024, 12, 134–140. [Google Scholar] [CrossRef]
Wang, Q.; Chen, L.; Shi, H.; Li, H.; Huang, J. Single-pixel imaging with untrained network using fourier transform at low sampling rates. Opt. Lasers Eng. 2025, 186, 108764. [Google Scholar] [CrossRef]
Li, Z.; Huang, J.; Shi, D.; Chen, Y.; Yuan, K.; Hu, S.; Wang, Y. Single-pixel imaging with untrained convolutional autoencoder network. Opt. Laser Technol. 2023, 167, 109710. [Google Scholar] [CrossRef]
Wang, F.; Wang, C.; Chen, M.; Gong, W.; Zhang, Y.; Han, S.; Situ, G. Far-field super-resolution ghost imaging with a deep neural network constraint. Light Sci. Appl. 2022, 11, 1–11. [Google Scholar]
Lei, G.; Lai, W.; Jia, H.; Wang, W.; Wang, Y.; Liu, H.; Cui, W.; Han, K. Low-sampling and noise-robust single-pixel imaging based on the untrained attention U-Net. Opt. Express 2024, 32, 29678–29692. [Google Scholar] [CrossRef]
Dong, W.; Wang, P.; Yin, W.; Shi, G.; Wu, F.; Lu, X. Denoising Prior Driven Deep Neural Network for Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2305–2318. [Google Scholar] [CrossRef] [PubMed]
Hussain, A.; Ullah, W.; Khan, N.; Khan, A.Z.; Kim, M.J. TDS-Net: Transformer enhanced dual-stream network for video Anomaly Detection. Expert Syst. Appl. 2024, 256, 124846. [Google Scholar] [CrossRef]
Ullah, W.; Hussain, T.; Ullah, F.U.; Muhammad, K.; Hassaballah, M.; Rodrigues, J.J.P.C.; Baik, S.W.; de Albuquerque, V.H.C.; Prakash, S. AD-Graph: Weakly Supervised Anomaly Detection Graph Neural Network. Int. J. Intell. Syst. 2023, 2023, 7868415. [Google Scholar] [CrossRef]
Yuan, H.; Song, H.; Sun, X.; Guo, K.; Ju, Z. Compressive sensing measurement matrix construction based on improved size compatible array LDPC code. IET Image Process 2015, 9, 993–1001. [Google Scholar] [CrossRef]
Xiang, J.; Zang, Y.; Jiang, H.; Wang, L.; Liu, Y. Soft threshold iteration-based anti-noise compressed sensing image reconstruction network. Signal Image Video Process 2023, 17, 4523–4531. [Google Scholar] [CrossRef]
Hu, J.; Min, L.; Guo, Y. Enhancing single-pixel imaging by improving one-dimensional signal through encoded image similarity. Opt. Laser Technol. 2025, 188, 112951. [Google Scholar] [CrossRef]
Zhao, H.; Wu, H.; Wang, X. OIAE: Overall Improved Autoencoder with Powerful Image Reconstruction and Discriminative Feature Extraction. Cogn. Comput. 2023, 15, 1334–1341. [Google Scholar] [CrossRef]
Zeng, H.; Dong, J.; Li, Q.; Chen, W.; Dong, S.; Guo, H.; Wang, H. Compressive Reconstruction Based on Sparse Autoencoder Network Prior for Single-Pixel Imaging. Photonics 2023, 10, 1109. [Google Scholar] [CrossRef]
Feng, W.; Yi, Y.; Li, S.; Xiong, Z.; Xie, B.; Zeng, Z. High turbidity underwater single-pixel imaging based on Unet++ and attention mechanism at a low sampling. Opt. Commun. 2024, 552, 322–336. [Google Scholar] [CrossRef]
Tropp, J.A.; Gilbert, A.C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Info. Theory 2007, 53, 4655–4666. [Google Scholar] [CrossRef]
Alain, G.; Bengio, Y. What Regularized Auto-Encoders Learn from the Data Generating Distribution. J. Mach. Learn. Res. 2014, 15, 3563–3593. [Google Scholar]
Liu, Q.; Yang, Q.; Cheng, H.; Wang, S.; Zhang, M.; Liang, D. Highly undersampled magnetic resonance imaging reconstruction using autoencoding priors. Magn. Reson. Med. 2019, 83, 322–336. [Google Scholar] [CrossRef] [PubMed]
Hestenes, M.R. Multiplier and gradient methods. J. Optim. Theory Appl. 1969, 4, 303–320. [Google Scholar] [CrossRef]
Liu, Q.H.; Shen, X.Y.; Gu, Y.T. Linearized ADMM for Nonconvex Nonsmooth Optimization with Convergence Analysis. IEEE Access 2019, 7, 76131–76144. [Google Scholar] [CrossRef]
Kivinen, J.; Warmuth, M.K. Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 1997, 132, 1–63. [Google Scholar] [CrossRef]

Figure 1. This is the structure of the single-pixel imaging system. After modulation by the parallel light tube, the light source becomes a single photon horizontal parallel light, and the object is imaged on the DMD under the irradiation of parallel light and then collected by the PMT. The imaged object selects the Chinese character “光” on the resolution board. The signal processed by the FPGA is the measured value Y, and the reconstructed image is obtained after reconstruction by the EMNP network.

Figure 2. This is the structure of the SAE Network.

Figure 3. This is the structure of the UNet Network. (a) The detailed structure of Unet network. (b) The symbol legend of Unet network.

Figure 4. This is the structure of EMNP-SPI.

Figure 5. This figure is the reconstruction result of different methods of Chinese character “光” under different measurement rates. The four networks are trained using the flower dataset, and the 64 × 64 grayscale image Chinese character “光” is used as the test image to reconstruct the 64 × 64 grayscale image.

Figure 6. This figure is the reconstruction result of different methods of letter “UCAS” under different measurement rates. The four networks are trained using the flower dataset, and the 64 × 64 grayscale image letter “UCAS” is used as the test image to reconstruct the 64 × 64 grayscale image.

Figure 7. This figure is the reconstruction result of the EMNP method at MR = 5%. Using the set91 dataset as the training dataset, the image is divided into 64 × 64 grayscale images for training, and the set5 grayscale image divided into 64 × 64 is used as the test image. After reconstruction, the 64 × 64 grayscale image is obtained, and the small image is spliced into a large image. (a) Original images in set5; (b) reconstruction results of EMNP-SPI.

Figure 8. This is the reconstruction results of “baby” using different algorithms. (a) TVAL3; (b) SAE (Reference [41]); (c) SPI-GAN; (d) EMNP-SPI.

Figure 9. This is a figure of reconstruction quality of different methods. (a) PSNR of different methods on set5; (b) SSIM of different methods on set5.

Figure 10. This is a figure of comparison of different structures. (a) PSNR results; (b) SSIM results.

Figure 11. This is the reconstruction results of sampled images in the SPI system. (a,c) are the reconstruction results of TVAL3; (b,d) are the reconstruction results of EMNP-SPI.

Figure 12. This is the reconstruction results of different methods.

Table 1. Hyperparameter b changes and PSNR of the reconstructed image with MR = 0.05.

b	0.1	0.2	0.3	0.4	0.5
PSNR	19.8245	19.8544	19.9447	19.9263	19.9212
b	0.6	0.7	0.8	0.9	1
PSNR	19.8925	19.7244	19.6404	19.6163	19.7862

Table 2. Hyperparameter a changes and PSNR of the reconstructed image with MR = 0.05.

a	0.1	0.2	0.3	0.4	0.5
PSNR	19.8425	20.0116	20.1404	20.1263	19.9712
a	0.6	0.7	0.8	0.9	1
PSNR	19.8225	19.7244	19.4404	19.4163	19.5712

Table 3. PSNR and SSIM of different methods at different MR.

Image	Method	MR = 0.01	MR = 0.05	MR = 0.1	MR = 0.2
Image	Method	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM
“光”	TVAL3	11.97/0.42	13.97/0.62	19.54/0.77	24.61/0.82
	Norm-1	10.46/0.39	13.49/0.59	18.94/0.75	23.39/0.79
	DDPM	11.89/0.41	18.86/0.64	23.86/0.79	27.91/0.85
	EMNP-SPI	12.04/0.43	19.84/0.68	23.98/0.80	27.94/0.86
“UCAS”	TVAL3	11.68/0.34	13.54/0.63	19.95/0.74	24.87/0.83
	Norm-1	10.46/0.36	13.49/0.59	18.94/0.72	23.91/0.84
	DDPM	11.97/0.38	19.56/0.65	23.91/0.78	27.86/0.86
	EMNP-SPI	12.24/0.40	19.96/0.67	24.12/0.79	28.32/0.87

Table 4. PSNR and SSIM of EMNP-SPI at different MR.

Image	Method	MR = 3%	MR = 5%	MR = 8%	MR = 10%
Image	Method	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM
baby	EMNP-SPI	16.23/0.42	20.47/0.56	22.34/0.76	24.68/0.84
bird	EMNP-SPI	15.96/0.39	19.48/0.52	21.74/0.72	23.88/0.82
butterfly	EMNP-SPI	16.14/0.41	20.67/0.57	22.67/0.78	25.24/0.85
head	EMNP-SPI	15.93/0.40	20.27/0.54	22.04/0.74	24.82/0.84
lenna	EMNP-SPI	16.43/0.43	20.87/0.58	22.64/0.79	24.91/0.87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, J.; Li, Q.; Dong, J.; Zhao, Q.; Wang, H. Single-Pixel Imaging Based on Enhanced Multi-Network Prior. Appl. Sci. 2025, 15, 7717. https://doi.org/10.3390/app15147717

AMA Style

Feng J, Li Q, Dong J, Zhao Q, Wang H. Single-Pixel Imaging Based on Enhanced Multi-Network Prior. Applied Sciences. 2025; 15(14):7717. https://doi.org/10.3390/app15147717

Chicago/Turabian Style

Feng, Jia, Qianxi Li, Jiawei Dong, Qing Zhao, and Hao Wang. 2025. "Single-Pixel Imaging Based on Enhanced Multi-Network Prior" Applied Sciences 15, no. 14: 7717. https://doi.org/10.3390/app15147717

APA Style

Feng, J., Li, Q., Dong, J., Zhao, Q., & Wang, H. (2025). Single-Pixel Imaging Based on Enhanced Multi-Network Prior. Applied Sciences, 15(14), 7717. https://doi.org/10.3390/app15147717

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Single-Pixel Imaging Based on Enhanced Multi-Network Prior

Abstract

1. Introduction

2. Single-Pixel Imaging System

3. Multi-Network Prior Reconstruction Method

3.1. Compressed Sensing

3.2. Sparse Autoencoder Network

3.3. Unet Network

3.4. Single-Pixel Imaging Based on EMNP

4. Result and Discussion

4.1. Select the Optimal Hyperparameters of EMNP-SPI

4.2. Simulation Experiment Results on Flower Dataset

4.3. Simulation Experiment Results on Natural Image Dataset

4.4. Results of Ablation Experiment

4.5. Results of EMNP on SPI System

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI