1. Introduction
The demand for IoT solutions is currently experiencing significant growth, with the IoT community and the majority of IoT terminal markets showing strong positive sentiment. In terms of device scale, a report by IoTAnalytics projects that there will be approximately 27 billion IoT devices by 2025 [
1]. Vision is considered the most critical and convenient form of perception, and visual data (such as images and videos) have become the preferred medium for information exchange within the IoT. It is reported that approximately 1.81 trillion photos are taken globally each year, and by 2030, the total number of photos taken is expected to reach 28.6 trillion. Of these, an estimated 6% will be shared and transmitted over the IoT to meet various needs. However, the rapid annual growth in the volume of data that IoT devices need to process has led to a significant challenge. Many IoT devices are resource-constrained, with limited data processing capabilities and strict energy consumption requirements. This necessitates the development of new solutions to efficiently handle data and support the execution of intelligent tasks within the IoT. For example, Yiting Lin [
2] proposed an image compression and reconstruction algorithm based on compressed sensing that addresses these challenges.
Compressed sensing (CS) [
3] is a recently developed signal acquisition, processing, and compression technique. It breaks through the limitations of the traditional Nyquist/Shannon [
4,
5] sampling theorem. Since its introduction by Candes, Tao, and Donoho in 2006, this theory has shown that it is possible to recover high-dimensional sparse signals from a small number of linear, non-adaptive measurements. This is achievable by solving an optimization problem, even when the measurement count is substantially less than what the Nyquist/Shannon theorem prescribes. Despite reducing the sampling rate, CS still allows for the efficient recovery of signals, making it a promising approach for IoT applications.
To address the optimization problem in CS, several efficient algorithms have been developed, including iterative hard thresholding (IHT) [
6], iterative shrinkage-thresholding algorithm (ISTA) [
7], fast iterative shrinkage-thresholding algorithm (FISTA) [
8], and approximate message passing (AMP) [
9].
The original CS problem involves finding the sparsest solution, defined as
Given noisy measurements
, the CS problem is typically solved as
where
represents the data fidelity term, and
is a regularization parameter. For example, ISTA updates the estimate as
and applies the thresholding operation:
where
k is the iteration step, and
controls the convergence speed and accuracy of the thresholding process.
The primary drawback of traditional reconstruction algorithms lies in their slow convergence speed. Due to the requirement for extensive iterations, significant computational resources are consumed when dealing with large-scale or high-dimensional datasets, making it difficult to meet efficiency demands. Additionally, the performance of traditional reconstruction algorithms highly depends on the selection of preset parameters, such as regularization thresholds and step sizes. These parameters often need experimental tuning, which increases the algorithm’s complexity and usability challenges.
In recent years, with the significant success of emerging deep learning (DL) techniques in computer vision, numerous DL-based models have been proposed for compressive sensing (CS) image reconstruction, such as LISTA [
10], ISTA-Net [
11], and FISTA-Net [
12]. Compared to traditional algorithms, these DL-based CS algorithms leverage extensive training data to learn complex signal features, thereby achieving higher quality reconstructions that better preserve image details and textures. Moreover, deep learning models autonomously learn features from data without the need for manually designed feature extraction methods, addressing the hyperparameter issues of traditional algorithms like FISTA. This capability of automatic feature learning endows deep learning methods with notable advantages in handling complex and high-dimensional data. In fact, in addition to capturing local image features, the global spatial information of images is crucial. However, relying solely on convolutional neural networks (CNNs) to comprehensively learn global information may be limited due to the inherent constraints of stacked convolutional layers, such as effective receptive fields and the issue of redundant filters from over-parameterization. This approach could potentially constrain image reconstruction performance. Addressing this challenge, Shen et al. proposed the TransCS model [
13], which introduces a custom ISTA-based Transformer backbone. This model applies iterative gradient descent updates and soft thresholding operations to represent the global spatial relationships among image patches. Nevertheless, the Transformer architecture in TransCS remains computationally complex. To simplify iteration counts and reduce computational resource consumption, we present FusionOpt-Net, a CS model based on Transformer and FISTA algorithms. FusionOpt-Net incorporates a momentum factor and novel sequences to accelerate convergence, while integrating Transformer’s global features to achieve superior image reconstruction performance.
FusionOpt-Net, with its high image reconstruction performance and fast computational speed, is particularly effective in eliminating blocking artifacts and restoring image details even at low sampling rates. This makes it highly suitable for real-time image reconstruction tasks, such as video compression and transmission [
14]. Moreover, FusionOpt-Net demonstrates excellent image reconstruction capabilities in noisy environments, maintaining high PSNR and SSIM metrics even in the presence of multiple levels of Gaussian noise. This suggests that the model is applicable in fields requiring high-quality image reconstruction in noisy conditions, such as medical imaging [
15] and remote sensing image processing. The primary contributions of this paper are as follows:
We propose an innovative framework that integrates FISTA with Transformer networks. Through this integration, we leverage the fast convergence properties of FISTA and the powerful feature extraction capabilities of Transformer networks to significantly enhance the performance of compressive sensing image reconstruction;
We conducted experiments on several public datasets to validate that the proposed FusionOpt-Net model outperforms other image-compression-aware reconstruction models significantly in terms of visual representation and quantitative performance metrics.
2. Related Work
2.1. FISTA Algorithm
The fast iterative shrinkage-thresholding algorithm (FISTA) is an accelerated gradient-based method designed to solve sparse linear inverse problems. It builds upon the traditional iterative shrinkage-thresholding algorithm (ISTA) by incorporating momentum acceleration, which significantly enhances convergence speed. Due to its efficiency, FISTA has been widely adopted in fields such as compressed sensing and image reconstruction.
FISTA is formulated to solve optimization problems of the form
Here,
represents a smooth convex function, typically associated with data fidelity, and is expressed as
is a non-smooth but convex regularization term, often chosen as the L1 norm:
Initialization: The algorithm starts with an initial point and an initial step size parameter .
Iterative Update: In each iteration, the following update rules are applied:
where
is the step size, typically set to
, with
L being the Lipschitz constant of the smooth function
. The function
denotes the proximal operator associated with
, defined as
The acceleration parameter
is then updated as follows:
Finally, the auxiliary variable
is updated using
Termination: The iterative process continues until a predefined convergence criterion is satisfied, such as when the difference falls below a certain threshold.
FISTA’s primary advantage lies in its enhanced convergence rate and ease of implementation. Specifically, compared to traditional gradient descent and ISTA, FISTA achieves a faster convergence rate by utilizing Nesterov’s momentum. This improvement leads to a theoretical convergence rate of compared to ISTA’s , making it highly effective for large-scale sparse problems. Furthermore, despite the inclusion of momentum, FISTA maintains a computational complexity comparable to ISTA, ensuring both efficient implementation and execution. Moreover, FISTA exhibits great flexibility, as it can be adapted to various regularization terms, such as L1 and L2 norms, making it applicable to a broad range of sparse optimization problems. This adaptability has contributed to FISTA’s widespread use as a reliable tool in areas like compressed sensing and image reconstruction.
2.2. Transformer
The Transformer [
16] is a deep learning architecture known for its reliance on the self-attention mechanism, which allows it to capture long-range dependencies in sequential data more effectively than traditional RNNs. Its multi-head attention further enhances the model’s ability to learn diverse patterns by processing multiple attention layers in parallel. Unlike RNNs, the Transformer operates with full parallelism, significantly improving training efficiency. Additionally, positional encoding is used to maintain the order of sequences, while residual connections and layer normalization ensure stable training. These features make the Transformer a highly flexible and powerful model, applicable across various domains including natural language processing and computer vision.
While the Transformer has become the standard for NLP tasks, its application in visual tasks still requires more exploration. An experimental approach to image compressed sensing (CS) is CSformer, which adopts a dual-stream, black-box strategy to merge intermediate features from both Transformer and CNN. In contrast, another work, TransCS, applies global attention to natural images through an iterative process, which can be regarded as an unfolded ISTA recovery framework. This method iteratively conducts gradient descent updates and soft-thresholding, providing well-defined interpretability. Additionally, by integrating Transformer and CNN into a hybrid architecture, TransCS excels at managing the relationships between high-level visual semantic features. Consequently, TransCS capitalizes on the strengths of both Transformer and CNN for image CS, learning global dependencies and local features of image patches, leading to hybrid image reconstruction with high recovery quality. However, the traditional ISTA algorithm used in TransCS, although resolving inherent hyperparameter challenges, suffers from slow convergence and low efficiency. To overcome this limitation, we combine the FISTA algorithm with Transformer, incorporating learnable momentum, which not only accelerates convergence but also preserves high reconstruction accuracy.
2.3. Deep Compressed Sensing
The fundamental idea behind deep compression sensing (DCS) is to utilize a neural network to learn the complex relationship between measurements and the original signal. This approach enhances both the speed and precision of the reconstruction process, thereby improving the overall performance in image sampling and reconstruction. Typically, DCS aims to minimize the expression , where represents the source signal, and denotes the observation, serving as the network input. The inverse transformation function, determined by the network’s parameters , is optimized through this process. With the ongoing advancements in deep learning, a growing number of DCS algorithms are being introduced.
These algorithms generally fall into two main categories. The first type integrates traditional CS algorithms with deep learning, employing neural networks for both implementation and computation in an iterative manner. This method maintains the stability and dependability of conventional algorithms while enhancing reconstruction quality and speed through deep learning. For example, ISTA-Net substitutes the sparsity constraints in the linear transform domain of traditional optimization-based spreading algorithms with constraints in the nonlinear transform domain of the network. A similar approach is employed in ADMM-CSNet [
17], which builds upon the ADMM algorithm. Although these models utilize a data-driven method for reconstruction, they continue to rely on the traditional, manually designed sensing matrix within the sampling module, potentially limiting reconstruction performance. Additionally, NeumNet [
18] was introduced by Gilton et al. as a solution for image inverse problems using the Neumann series. While NeumNet offers high-speed image reconstruction, the resulting images are still significantly impacted by blocking artifacts. AMP-Net incorporates the unfolding algorithm AMP into a neural network structure, extending its capabilities. TransCS, on the other hand, introduces a Transformer-based network built on ISTA that captures global dependencies between image sub-blocks while iteratively applying gradient descent and soft-thresholding operations. Furthermore, DRCAMP-Net [
19] integrates AMP with extended residual convolution to mitigate block artifacts and broaden the receptive field.
Another approach focuses on deep learning models built on convolutional neural networks (CNNs). These models reconstruct images by stacking convolutional layers, prioritizing the retention of local image features. For example, DR2-Net [
20] leverages linear mapping and residual networks for initial and final image reconstruction, while ReconNet achieves this directly through convolutional layers. DPA-Net [
21] enhances reconstruction quality by preserving texture details, and MSCRLNet [
22] uses multi-scale residual networks to improve shallow feature extraction by concentrating on channels. However, due to the inherent locality of convolutional layers, CNN-based models have limitations in capturing global positional relationships. To address global dependencies, these models often resort to inefficient stacking of convolutional layers to expand the receptive field. Thus, there is a clear need to establish a new DL-based image CS paradigm that effectively captures global relationships among image subblocks.
3. FusionOpt-Net Module
We propose a novel algorithmic framework that integrates the iterative process of FISTA with the feature extraction process of Transformer networks. This architecture combines the iterative algorithm of FISTA-Net with the deep self-attention mechanism of TransCS, achieving superior image reconstruction performance through technical fusion. The data flow is illustrated in
Figure 1.
The core of the FusionOpt-Net model is the FISTA-based Transformer backbone. We customize the traditional FISTA by embedding it into the Transformer architecture. This customization allows the Transformer to effectively model the global dependencies among image subblocks, which are crucial for accurately reconstructing images from compressed measurements. In each iteration, the model performs a gradient descent update followed by a soft thresholding operation, which is a typical step in FISTA. This process is then integrated into the Transformer’s multi-head self-attention mechanism. By doing so, the model not only captures local image features but also effectively models the long-range dependencies across the entire image, which is essential for high-quality reconstruction.
In the model architecture diagram, from “stage 1” to “stage n”, each stage has a clear momentum update mechanism designed to accelerate the convergence of the model. As seen in the diagram, the output of each stage (after processing by the proximal mapping module) is combined with the output from the previous stage, and through a weighted summation (including the momentum term ), the input for the next stage is formed. This structure, by adding a momentum term, optimizes the current update step by utilizing a linear combination of the previous two iteration results during each iteration. This not only helps to accelerate convergence but also effectively mitigates oscillations during the iterative process. Our momentum module is designed as a learnable parameter, and during the entire network training process, these momentum parameters dynamically adjust according to the specific task requirements to ensure faster convergence and higher performance.
3.1. Sampling Module
In order to achieve better image reconstruction results, the sampling module of FusionOpt-Net utilizes a data-driven trainable sensing matrix. The sampling module uses a partition function
to divide the original image
into
non-overlapping blocks, followed by a flattening function
that projects the blocks into vectors. The sensing matrix
is trained through backpropagation using training images, ultimately conforming to a Gaussian distribution. Therefore, the sampling module can be expressed as
where
signifies the sampling process. Compared to random sensing matrices, the learned ones are more efficient for hardware implementation and demand less storage capacity.
3.2. Reconstruction Module
The FusionOpt-Net reconstruction module includes two submodules: initial reconstruction and deep reconstruction.
3.2.1. Initial Reconstruction
The initial reconstruction module is a key component of the FusionOpt-Net framework, with its primary task being the initial reconstruction of the image after sampling. This module is implemented through a trainable initial reconstruction matrix
. The matrix
is initialized as the transpose of the sampling matrix
,
. This initialization method leverages the structural information of the sampling matrix, contributing to the stability of the initial reconstruction. The sampled image representation
undergoes a linear transformation using the initial reconstruction matrix
, yielding the initial reconstructed image
. This process is expressed as
where
represents the initial reconstruction operation. Relying solely on the initial reconstruction module may lead to artifacts and missing details in the initial reconstructed image because the initial reconstruction process only performs a simple linear transformation. To improve reconstruction quality and reduce artifacts, the deep reconstruction module further refines the reconstruction based on this initial output.
3.2.2. Deep Reconstruction
The deep reconstruction module is implemented using a Transformer backbone network and CNN based on FISTA. The Transformer backbone network guides the solution of the general -norm optimization problem at each layer, where the threshold and shrinkage values are updated in each iteration. The momentum is learned automatically from the training data. represents the residual in FISTA, while the current estimate is obtained from the previous estimate .
Inspired by TransCS, we designed a function
that partitions the
input into non-overlapping
blocks. The iterative shrinkage-thresholding operation is then expressed as
where
is the input at the
k-th iteration,
denotes the output at the same iteration, and
is the step size updated at each iteration according to traditional FISTA.
Next, the pre-processing module (piling residual layers) refines the output
to reduce noise and preserve high-quality details through convolutional layers, learned from training data. This process is expressed as
where
denotes the inverse vectorization function, and
represents the
k-th convolutional layer. The pre-processing module consists of six layers, each with a
kernel size. The first and last layers have one channel, while the middle FusionOpt-Net layers have 32 channels.
In the coding module of the Transformer, FusionOpt-Net deep reconstruction uses the embedded positions of image patches to compress the sequence of input tokens. The positional encoding (PE) is used to retain the spatial relationships between image patches. The final result is a matrix representing the encoded sequence, expressed as
where
is the Transformer encoder function, and
represents a function that partitions the input into non-overlapping blocks. After encoding, the representation
undergoes element-wise soft thresholding to reduce noise and improve sparsity. This process is expressed as
where
is the sign function,
is an activation function,
is the absolute value function, and
is the current threshold.
The result of the soft thresholding
is combined with the pre-processed result
, and a weighted update is performed:
where
is the weight factor, and
represents the decoder function at the
k-th iteration.
After the deep module, a post-processing module is designed, which is expressed as
where
represents the inverse partition function, and
denotes the convolutional layer configuration in the post-processing block.
The updated vectorization function
reprojects the reconstruction result, allowing the image blocks to proceed to the next iteration:
Intermediate variables
and
are updated for the next iteration with momentum strategies to accelerate convergence:
The final reconstruction result is obtained by applying the inverse vectorization function
to the final iteration output:
The parameter changes during each iteration can follow a predetermined pattern. Consequently, we present Algorithm 1 to illustrate the reconstruction process.
Algorithm 1 Forward Propagation for Image Recover |
Require: number of iteration stages n, initial reconstruction matrix , soft thresholds , weight coefficients , iteration step size , measurements , scalar for momentum update , sensing matrix Ensure: reconstructed image - 1:
Trainable hyperparameters: - 2:
Initialization: - 3:
Begin the iteration: - 4:
while do - 5:
- 6:
- 7:
- 8:
- 9:
- 10:
- 11:
- 12:
- 13:
- 14:
end while - 15:
|
3.3. Loss Function
During the training of FusionOpt-Net, we simultaneously refine the sampling module
and the recover module
, with the original images serving as both inputs and training labels. The parameters to be trained for the
k-th stage of the deep reconstruction are denoted by
, while for the
n stages, the collective trainable parameters are indicated by
. To automatically train the initialization and deep reconstruction modules from the measured values
, we measure the differences between the source and recovered images using mean squared error (MSE). We define the loss function as
where
represents the
i-th training image, and
n is the total number of training images.
4. Experimental Results
In this section, several experiments are conducted to verify the performance of the proposed method. Firstly, in
Section 3.2, a comparison between the FusionOpt-Net method and other models is performed on several public datasets. Subsequently, in
Section 3.3, the robustness of the FusionOpt-Net method is tested on multi-level Gaussian noise images.
4.1. Experimental Settings
The FusionOpt-Net training dataset is derived from the BSD500 dataset [
23], comprising 200 training images, 100 validation images, and 200 testing images. The validation dataset we use is Set11. We randomly segment images in the training dataset into 200 sub-images, each measuring 96 × 96 pixels, creating a total of 100,000 sub-images. To augment the data, we apply random horizontal and vertical flips, rotations, and scaling to enhance image diversity. The experimental results are evaluated using three widely used benchmarks: Set11 [
24], BSD200 [
23], and Urban100 [
25].
The FusionOpt-Net training process follows the same settings as the DL-based CS method (such as ISTA-Net). The patch size
P is set to 8, the initial step size is 1.0, and the regularization parameter
is initialized to 0.1. The initial value of
is 0.01, and the number of iterations
H is set to 8. Training is conducted for 200 epochs with a batch configuration of 64. The learning rate is set to decay from the 101st to the 150th epoch, and the last 50 epochs are trained with a constant learning rate. We use the Adam optimizer for training. The FusionOpt-Net model is compared with several state-of-the-art methods, including CSformer [
26], ISTA-Net+ [
11], CSNet [
27], AMP-Net [
28], and TransCS, which are all based on traditional algorithms combined with deep learning models. Performance evaluation is carried out using perceptual metrics, PSNR, and SSIM. The better performance of the method is indicated by higher PSNR and SSIM values. The models compared to FusionOpt-Net are obtained from their respective sources and executed with default configurations. To ensure an unbiased evaluation, all training images for the rival models are sourced from the BSD500 dataset. The experiments are conducted using the PyTorch 1.9.0 framework on a system with an Intel Xeon 8336 CPU and a GeForce RTX 4090 GPU.
4.2. Comparisons with State-of-the-Art Methods
In our study, we conducted a comprehensive evaluation of Csformer, ISTA-Net+, AMP-Net, CsNet, TransCS, and our proposed model across the Set11, BSD200, and Urban100 datasets at sampling rates of 0.04, 0.1, 0.25, and 0.5. The evaluation metrics used were peak signal-to-noise ratio, (PSNR, dB) and structural similarity index (SSIM).
Table 1 presents the detailed experimental results.
The FusionOpt-Net model consistently demonstrated significant advantages across all datasets and sampling rates:
Set11 Dataset: At a sampling rate of 0.04, our model achieved a PSNR of 25.34 and SSIM of 0.7815, both superior to other models. At higher rates like 0.5, our model further demonstrated superiority with a PSNR of 39.91 and SSIM of 0.9809, notably higher than ISTA-Net+ (38.07) and TransCS (38.88).
BSD200 Dataset: Across various sampling rates, our model consistently outperformed others. For instance, at a rate of 0.25, our model achieved a PSNR of 31.91 and SSIM of 0.9237, surpassing TransCS (PSNR 31, SSIM 0.9171) and ISTA-Net+ (PSNR 29.51, SSIM 0.8659). At a 0.5 sampling rate, our model reached a PSNR of 37.06 and SSIM of 0.9748, reaffirming its superior performance.
Urban100 Dataset: At a low sampling rate of 0.04, our model led with a PSNR of 22.05 and SSIM of 0.6619. At a 0.5 sampling rate, our model achieved a PSNR of 35.51 and SSIM of 0.9758, significantly surpassing ISTA-Net+ (PSNR 34.58, SSIM 0.9661) and TransCS (PSNR 34.16, SSIM 0.9687).
On average, our model exhibited the highest PSNR (32.32) and SSIM (0.9), significantly outperforming other models. These results underscore the capability of our model to consistently deliver high-quality reconstructed images across different datasets and sampling rates. Moreover, its robust performance at low sampling rates (e.g., 0.04 and 0.1) highlights its efficacy in sparse data scenarios.
Visual comparisons between our model and competing methods further validate our findings. As shown in
Figure 2, Our approach excelled in detail preservation, texture reconstruction, and edge sharpness, notably outperforming ISTA-Net+, AMP-Net, and TransCS. Specifically, our model accurately reproduced complex structures, such as natural shadow transitions in portrait images and sharp patterns in butterfly wings, reducing blurring effects significantly compared to other methods.
In conclusion, our model exhibits superior performance in compressive sensing image reconstruction tasks, as evidenced by both quantitative metrics (PSNR, SSIM) and qualitative visual assessments. These findings underscore the effectiveness and potential application value of our proposed method in the field of image reconstruction.
4.3. Noise Robustness
To assess image reconstruction robustness in various noisy environments, Gaussian noise with a mean of zero and standard deviations
was added to the BSD200 test dataset. The noise robustness of FusionOpt-Net was compared with three deep learning models (ISTA-Net+, AMP-Net, and TransCS). PSNR and SSIM metrics were used for evaluation at four sampling rates
, along with visual noise level comparisons. Additionally, average PSNR and SSIM values for different noise levels are provided for all four reconstruction methods.The results are shown in
Table 2.
Firstly, under different noise levels, the average peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) are compared. Regardless of the noise level, the FusionOpt-Net model achieves the highest PSNR and SSIM values in most situations. This indicates that the FusionOpt-Net model maintains superior image quality and structural details over other models in varying noise levels. For instance, at a noise level of , the PSNR of the FusionOpt-Net model only decreases by 2.86% from 28.26 to 23.68, while the PSNR of ISTA-Net+ dropped from 25.76 to 23.68. These data indicate that the FusionOpt-Net model maintains a higher PSNR in noisier environments and outperforms other models in preserving visual structures. The FusionOpt-Net model is able to effectively suppress noise and preserve the high-frequency information of the image structure under various noise conditions.
In particular, under high noise levels and sample rates, the FusionOpt-Net model still demonstrates excellent visual consistency and structural delineation.As shown in
Figure 3. Compared to ISTA-Net+, AMP-Net, and TransCS, the FusionOpt-Net model shows stronger robustness and reliability in various noise environments, achieving superior visual effects and reliable results in practical applications.
4.4. Complexity Analysis
We conduct a model complexity analysis of MMR-CSNet and several competing methods (ISTA+, CSNet, CSformer, AMP-9BM, TransCS) across three dimensions: average runtime, the number of giga floating-point operations (GFLOPs), and the number of parameters. The average runtime assesses the time required for the model to compress and reconstruct an image. GFLOPs are used to evaluate the computational complexity, while the number of parameters reflects the spatial complexity of the model. These metrics are derived by forward propagating a single 256 × 256 image at a 0.1 sampling rate. As illustrated in
Table 3 and
Figure 4.
FusionOpt-Net achieves a computational time of 0.026 s on the RTX 4090D GPU for , making it highly efficient and suitable for real-time applications. Although slightly slower than the fastest model, CSNet (0.008 s), it remains competitive with methods like ISTA-Net+ (0.023 s) and AMP-Net (0.017 s), highlighting a balance between complexity and performance. The parameter count of 1.445 Mb, comparable to TransCS (1.489 Mb), reflects FusionOpt-Net’s enhanced feature extraction capabilities, justifying the trade-off for improved reconstruction quality and flexibility. With a moderate computational complexity of 12.011 GFLOPs, FusionOpt-Net is optimized for efficiency without compromising performance, making it a strong candidate for scenarios requiring high precision and resource-conscious deployments.
4.5. Ablation Studies
To verify the efficacy of the measurement reuse strategy, we further conduct ablation studies on BSDS100. The models compared include FusionOpt-Net and FusionOpt-Net without the momentum module. From the results, as shown in
Figure 5, we can observe the following: The momentum module is useful for improving the reconstruction quality of an image. It plays a more important role, especially at high sampling rates. This is probably because the module acts as a residual-like structure in the overall structure, which improves the stability of the deep learning model during image reconstruction, resulting in a higher quality of the recovered result, which is more similar to the original image.
6. Conclusions
This paper introduces a novel compressed sensing image reconstruction algorithm that integrates the FISTA and Transformer networks. By combining the fast convergence properties of FISTA with the powerful feature extraction capabilities of Transformer networks, we have developed an efficient and high-quality image reconstruction method. The experimental results demonstrate that the FusionOpt-Net model exhibits significantly superior reconstruction performance across multiple image datasets, outperforming existing models such as ISTA-Net+ and TransCS in terms of metrics like PSNR and SSIM. Particularly noteworthy is its ability to preserve fine details and suppress noise effectively, especially in scenarios with high noise levels and low sampling rates, showcasing robustness in diverse environments.
In comparison to traditional algorithms, the FusionOpt-Net model not only addresses the complexity of hyperparameter tuning but also leverages deep learning to automatically learn features from data, thereby substantially enhancing image reconstruction quality. The future work will focus on further optimizing algorithmic efficiency and exploring its potential in other compressed sensing applications, aiming to achieve efficient and high-quality image reconstruction in broader contexts. This study provides new insights into advancing compressed sensing reconstruction algorithms and establishes a solid foundation for practical image reconstruction tasks.