1. Introduction
As a joint framework of sampling and compression, compressive sensing (CS) [
1,
2] shows that if a signal is sparse in some domains, it can be perfectly reconstructed from fewer samples than Nyquist rate. This characteristic demonstrates its two great potentials in signal acquisition and processing. First, as the number of samples is greatly reduced, this make it possible for some devices with limited sensor size to obtain high definition information using low definition sensors.
Figure 1 shows the architecture of the single-pixel camera [
3]. With a sensor with only one pixel, this system can get a complete image. Second, the CS framework transfers the computational burden to the decoding side. For some energy limited applications, such as wireless sensor network, this advantage can greatly extend the life cycle of the nodes. As the encoding side is simplified, the performance of the system depends largely on the performance of the decoding side, namely, the “Recovery method” part in
Figure 1. This paper focus on the recovery method of image CS. Due to the advantages mentioned above, CS have been applied in many fields, such as digital imaging [
3], background subtraction [
4], medical imaging [
5], and remote sensing [
6].
In the framework of compressive sensing, a one-dimensional sparse signal can be reconstructed by solving a
-norm minimization problem. Since
-norm minimization is non-convex and NP-hard,
-norm is often replaced by
-norm. It has been proved that these two norm are equivalent in most cases [
2] and many CS recovery methods are proposed, such as iterative thresholding algorithm [
7], orthogonal matching pursuit [
8], and split Bregman algorithm [
9].
For image compressive sensing, the key issue is how to exploit the intrinsic prior information of images. As the model of prior knowledge has a significant impact on the performance of image compressive sensing algorithms, many kinds of regularizations have been developed. Conventional regularization terms, such as Mumford–Shah (MS) model [
10] and total variation (TV) [
7,
11,
12,
13], are established under the assumption that images are locally smooth. For example, Li et al. [
13] proposed a TV-based CS algorithm and developed an efficient augmented lagrangian method to solve it. Candes et al. [
11] enhanced the sparsity of TV norm via a weighted strategy. However, these regularizations only consider local smoothness of images and cannot restore details and textures well. TV norm also favors piecewise constant solution, resulting in oversmoothing. To overcome this problem and improve performance, many compressive sensing methods utilized the prior information of transform coefficients [
14,
15,
16]. Kim et al. [
15] modeled the statistical characteristics between transform coefficients with a Gaussian Scale Mixture (GSM) and achieved better reconstruction performance.
In the past few years, sparse representation has begun to emerge and demonstrated good performance in various image processing tasks [
17,
18,
19,
20,
21]. The purpose of sparse representation is to represent a signal with as few atoms as possible in a learned over-complete dictionary. Compared with fixed dictionary, the learned dictionary can better express sparsity of images. However, dictionaries are generally learned from external clean images, and it may suffer from high computational complexity.
Recently, inspired by nonlocal means (NLM) [
22], many algorithms based on nonlocal self-similarity have been proposed [
23,
24,
25,
26,
27,
28,
29]. Dabov et al. proposed a Block-Matching and 3D filtering (BM3D) algorithm for image denoising [
23]. In BM3D, similar patches in a degraded image are grouped into 3D arrays and collaborative filtering is performed in 3D transform domain. Egiazarain et al. extended BM3D to compressive sensing and proposed BM3D-CS. Zhang et al. [
26] proposed a structural group sparsity representation (SGSR) model to enforce image sparsity in an adaptive SVD domain. Dong et al. [
28] proposed a nonlocal low-rank regularization (NLR) to exploit the self-similarity, and applied it to the reconstruction of photographic and MRI images. In [
29], Zha et al. incorporated a non-convex penalty function to group sparse representation and obtained state-of-the-art reconstruction performance. Gao et al. [
30] proposed to use Z-score standardization to improve the sparse representation ability of patch groups. Keshavarzian et al. [
31] proposed to utilize the principle component analysis (PCA) to learn a dictionary for each group and introduced non-convex
-norm regularization to better promote the sparsity of the patch group coefficients. In [
32], internal self-adaptive dictionary and external learned dictionary were used to encode a patch group alternately and achieved better performance than single dictionary.
Another idea is to exploit both local sparsity and nonlocal self-similarity [
33,
34,
35,
36,
37]. For example, Zhang et al. [
33] combined local anisotropic total variation with nonlocal 3D sparsity, and named it Collaborative Sparsity Measure (CoSM). Different from the work in [
33], Eslahi et al. [
37] used curvelet transform to enforce local patterns. In [
34], Dong et al. utilized local patch-based sparsity and nonlocal self-similarity constrain to balance the trade-off between adaptation and robustness. Zhou, et al. [
38] proposed a data-adaptive kernel regressor to extract local structure and used nonlocal means filter to enforce nonlocal information.
With the development of deep learning, many convolutional neural network (CNN) based image compressive sensing algorithms have been proposed. For example, Kulkarni et al. [
39] proposed a non-iterative and parallelizable CNN architecture to get an initial recovery and fed it into an off-the-shelf denoiser to obtain the final image. Zhang et al. [
40] cast the Iterative Shrinkage- Thresholding Algorithm (ISTA) into CNN framework and developed an effective strategy to solve it. In [
41], low-rank tensor factor analysis was utilized to capture nonlocal correlation and a deep convolutional architecture was adopted to accelerate the matrix inversion in CS. DR
-Net [
42] utilized a linear mapping to reconstruct a preliminary image and used residual learning to further promote the reconstruction quality. Yang et al. [
43] unrolled the Alternating Direction Method Multipliers (ADMM) to be a deep architecture and proposed ADMM-CSNet. Zhang et al. [
44] proposed a optimization-inspired explicable deep network OPINE-Net and all the parameters were learned end-to-end using back-propagation.
In this paper, we propose a Hybrid NonLocal Sparsity Regularization (HNLSR) for image compressive sensing. First, different from the methods mentioned above, two nonlocal self-similarity constrains are applied to exploit the intrinsic sparsity of images simultaneously. Then, fixed dictionaries are universal, and learned dictionaries are more robust to the image itself. To take advantages of them, both fixed 3D transform and 2D self-adaptive dictionary are utilized. Finally, for the non-convex model of HNLSR, we use the split Bregman to divide it into several subproblems, making it easier and more efficient to be solved. The flowchart is illustrated in
Figure 2. Experimental results show that compared with both model-based algorithms and deep learning-based algorithms, the proposed HNLSR-CS demonstrates the superiority of its performance.
The remainder of this paper is organized as follows.
Section 2 introduces the related works. In
Section 3, we present the proposed method. The experiment and analysis are elaborated in
Section 4.
Section 5 concludes the paper.
5. Conclusions and Future Work
This paper proposes a Hybrid Nonlocal Sparsity Regularization (HNLSR) method for image compressive sensing. Different from existing methods, the proposed HNLSR does not consider the local sparsity of images, but uses two dictionaries to explore the nonlocal self-similarity. The 2D dictionary is self-generated and the 3D dictionary is a fixed dictionary, which can combine the advantages of adaptability and versatility from different dictionaries. An effective framework based on SBI is present to solve the optimization problem. The convergence and stability of the proposed method have also been proven. Experimental results show that compared with methods which are based on local and nonlocal regularizations or single nonlocal regularization, the proposed method performs better than most existing image compressive sensing methods in both quality assessment and visual quality.
As multiple dictionaries can improve the performance, we are considering some research directions. For example, learning different dictionaries for different areas of the images (e.g., smooth area and textured area). Another direction is to learn multi-scale dictionaries and select them adaptively according to the parameters. Our future work include extending the proposed method to other image processing tasks (e.g., denoising, deblocking, and deblurring) and high-dimensional data (e.g., videos and multispectral images). For high-dimensional or multi-frame data, how to collect similar patches (intra- or inter-frame) is also a problem to be solved.