*3.1. Flexible Partitioning by Mean Information Entropy (MIE) and Texture Structure (TS)*

Reasonable block partitioning reduces the information entropy (IE) of each sub-block to improve the performance of the BCS algorithm at the same total sampling rate (TSR), and ultimately improves the quality of the entire reconstructed image. In our paper, we adopt flexible partitioning with image sub-block shape *<sup>n</sup>* <sup>=</sup> *row* <sup>×</sup> *column* <sup>=</sup> *<sup>l</sup>* <sup>×</sup> *<sup>h</sup>* to remove the blindness of image partitioning with the help of texture structure (TS) and mean information entropy (MIE) instead of the primary shape *<sup>n</sup>* <sup>=</sup> *<sup>B</sup>* <sup>×</sup> *<sup>B</sup>*. The expression of TS is based on the gray-tone spatial-dependence matrices and the angular second moment (ASM) [26,27]. The value of TS is defined as follows using ASM:

$$\begin{aligned} \text{g\_{TS}} &= \sum\_{i=0}^{255} \sum\_{j=0}^{255} \{ p(i, j, d, a) \} \\ p(i, j, d, a) &= P(i, j, d, a) / R \end{aligned} \tag{5}$$

where, *P*(*i*, *j*, *d*, *a*) is the (i,j)-th entry in a gray-tone spatial-dependence matrix, *p*(*i*, *j*, *d*, *a*) is the normalized form of *P*(*i*, *j*, *d*, *a*), (*i*, *j*, *d*, *a*) is the neighboring pixel pair with distance *d*, orientation *a*, and gray value (*i*, *j*) in the image, and *R* denotes the number of neighboring resolution cell pairs. The definition of MIE of the whole image is as follows:

$$\log\_{\text{MIE}} = \frac{1}{N/n} \sum\_{i=1}^{N/n} \left( -\sum\_{j=0}^{255} e\_{i,j} \log\_2 e\_{i,j} \right) = \frac{1}{T\_1} \sum\_{i=1}^{T\_1} \left( -\sum\_{j=0}^{255} e\_{i,j} \log\_2 e\_{i,j} \right) \tag{6}$$

where, *ei*,*<sup>j</sup>* is the proportion of pixels with gray value *j* in the *i*-th sub-image, and *T*<sup>1</sup> is the number of sub-images.

Suppose the flexible partitioning of BCS is reasonable, increasing the similarity between pixels in each sub-block and reducing the MIE of the whole image sub-blocks will inevitably bring about a decrease in the difficulty of image sampling and recovery, which means that the flexible partitioning itself is a process of reducing order and rank. Figure 1 shows the effect on the MIE of four 256 × 256-pixel-testing gray images with 256 gray levels by different partitioning methods when the number of pixels per sub-image is limited to 256. The abscissa represents different 2-base partitioning modes, the ordinate represents the MIE of the whole image in different partitioning modes. Figure 1 indicates that images with different structures reach minimum MIE on different partitioning points which will be used in flexible partitioning to provide a basis for block segmentation.

**Figure 1.** Effect of different partitioning methods on mean information entropy (MIE) of images (the abscissa represents flexible partitioning with shape *<sup>n</sup>* <sup>=</sup> *<sup>l</sup>* <sup>×</sup> *<sup>h</sup>* <sup>=</sup> <sup>2</sup>*i*−<sup>1</sup> <sup>×</sup> 29−*<sup>i</sup>* <sup>=</sup> 256).

Furthermore, the MIE guiding the partitioning of the image only considers the pixel level of the image, i.e., gray scale distribution, without considering the image optimization of the spatial level, i.e., texture structure. In fact, TS information is also very important for image restoration algorithms. Therefore, this paper uses the method of *gMIE* combined with *gTS* to provide the basis for flexible partitioning, that is, weighted MIE (WM) which is defined as follows:

$$\mathbf{g\_{FB}} = \mathbf{c\_{TS}} \times \mathbf{g\_{ME}} = f(\mathbf{g\_{TS}}) \times \mathbf{g\_{ME}} \tag{7}$$

where, *cTS* is the weighting coefficient, *<sup>f</sup>*(∗) is the weighting coefficient function, and its value is related to the TS information *gTS*.

### *3.2. Adaptive Sampling with Variance and Local Salient Factor*

The feature selection for distinguishing the detail block and the smooth block is very important on the process of adaptive sampling. Information entropy, variance, and local standard deviation are often used as criteria for features, respectively. The shortcomings are found in using the above features individually as criteria for adaptive sampling, such as information entropy only reflects the probability of gray distribution, the variance is also only related to the degree of dispersion of the pixels, and the local standard deviation only focuses on the spatial distribution of the pixels. Secondly, the adaptive sampling rate is mostly set using segment adaptive sampling instead of step-less adaptive sampling in the previous literature [28], which leads to the discontinuity of sampling rate and the inadequacy utilization of the distinguishing feature.

In order to overcome the shortcomings of individual features, this paper uses the synthetic feature to distinguish between smooth blocks and detail blocks. The synthetic feature for adaptive sampling is defined as:

$$J(\mathbf{x}\_i) = L(\mathbf{x}\_i)^{\lambda\_1} \times D(\mathbf{x}\_i)^{\lambda\_2} \tag{8}$$

where, *<sup>D</sup>*(*xi*) and *<sup>L</sup>*(*xi*) denote the variance and local salient factor in the *<sup>i</sup>*-th sub-image, and <sup>λ</sup><sup>1</sup> and <sup>λ</sup><sup>2</sup> are the corresponding weighting coefficients. The expressions of variance and local salient factor for the sub-block are as follows:

$$\begin{aligned} D(\mathbf{x}\_i) &= \frac{1}{n} \sum\_{j=1}^n \left( \mathbf{x}\_{ij} - \mu\_i \right)^2 \\ L(\mathbf{x}\_i) &= \frac{1}{n} \sum\_{j=1}^n \frac{\sum\_{k=1}^k \left| \mathbf{x}\_{ij}^k - \mathbf{x}\_{ij} \right|}{\mathbf{x}\_{ij}} \end{aligned} \tag{9}$$

where, *xij* is the gray value of the *j*-th pixel in the *i*-th sub-image, μ*<sup>i</sup>* is the gray mean of the *i*-th sub-block image, *xk ij* is the gray value of the *k*-th pixel in the salient operator domain around the center pixel *xij*, and *<sup>q</sup>* represents the number of pixels in the salient operator domain. The synthetic feature *<sup>J</sup>*(*xi*) can not only reflect the degree of dispersion and relative difference of sub-image pixels, but also combines the relationship between sensory amount and physical quantity of Weber's Law [29].

In order to avoid the disadvantage of segmented adaptive sampling, step-less adaptive sampling is adopted in this literature [30–32]. The key point of step-less adaptive sampling is how to select a continuous sampling rate accurately based on the synthetic feature. The selection of continuous sampling rates is achieved by setting the sampling rate factor (η*SR*) based on the relationship between the sensory amount and the physical quantity in Weber's Law. The sampling rate factor (η*SR*) and the step-less adaptive sampling rate (*cSR*) are defined as follows:

$$\eta\_{\rm SR}(\mathbf{x}\_i) = \frac{\log\_2 f(\mathbf{x}\_i)}{\frac{1}{T\_1} \sum\_{j=1}^{T\_1} \log\_2 f(\mathbf{x}\_j)} \tag{10}$$

$$
\omega\_{\rm SR}(\mathbf{x}\_i) = \eta\_{\rm SR}(\mathbf{x}\_i) \times TSR \tag{11}
$$

where, *TSR* is the total sampling rate of the whole image.

### *3.3. Error Estimation and Iterative Stop Criterion in Reconstruction Process*

The goal of the reconstruction process is to provide a good representative of the original signal:

$$\mathbf{x}^\* = \begin{bmatrix} \mathbf{x}\_{1'}^\*, \mathbf{x}\_{2'}^\*, \dots, \mathbf{x}\_N^\* \end{bmatrix}^T, \quad \mathbf{x}^\* \in \mathbb{R}^N. \tag{12}$$

Given the noisy observed output ('*y*) and finite-length sparsity (K), the performance of reconstruction is usually measured by the similarity or the error function between *x*∗ and *x*. In addition, the reconstruction method, whether it is a convex optimization algorithm or a non-convex optimization algorithm, needs to solve the NP-hard problem by linear programming (LP), wherein the number of the correlation vectors is crucial. Therefore, the error estimation and the selection of the number of correlation vectors are two important factors of reconstruction. Especially in some non-convex optimization restoration algorithms, such as the OMP algorithm, the selection of the number of correlation vectors is linearly related to the number (*v*) of iterations of the algorithm. So, the two points (error estimation and optimal iteration) need to be discussed below.

### 3.3.1. Reconstruction Error Estimation in Noisy Background

In the second section, Equation (1) was used to describe the relationship model between the original signal and the noiseless observed signal, but the actual observation is always in the noise background, so the observed signal in this noisy environment is as shown in the following equation:

$$
\widetilde{y} = \Phi \mathbf{x} + w = \Phi \mathbf{\Psi} \mathbf{s} + w = \Omega \mathbf{s} + w \tag{13}
$$

where, '*y* is the observed output in the noisy environment, and *w* is an additive white Gaussian noise (AWGN) with zero-mean and standard deviation σ*w*. The M-dimension AWGN *w* is independent of the signal *x*. Here, we discuss the reconstruction error in two steps, the first step confirms the entire reconstruction model, and the second step derives the relevant reconstruction error.

Since the original signal (*x*) itself is not sparse, it is K-sparse under sparse basis (Ψ), so we have:

$$\mathbf{s} = \mathbf{Y}^{-1}\mathbf{x}, \quad \mathbf{s} = \begin{bmatrix} \mathbf{s}\_1, \mathbf{s}\_2, \dots, \mathbf{s}\_{k\_I}, \dots, \mathbf{s}\_N \end{bmatrix}^T \tag{14}$$

where, [*s*1,*s*2, ··· ,*sk*, ··· ,*sN*] *<sup>T</sup>* is a vector of length *N* which only has *K* non-zero elements, i.e., the remaining *N*-*K* micro elements are zero or much smaller than any of the *K* non-zero elements. Assuming that the first K elements of the sparse representation *s* are just non-zero elements without loss of generality, we can have:

$$\mathbf{s} = \begin{bmatrix} \mathbf{s}\_{\mathbf{K}} \\ \mathbf{s}\_{\mathbf{N}-\mathbf{K}} \end{bmatrix} \tag{15}$$

where, *sK* is a *K* dimensional vector and *sN*−*<sup>K</sup>* is a vector of length *N*-*K*. The actual observed signal obtained by Equations (13) and (15) can be described as follows:

$$\widetilde{\mathbf{y}} = \mathbf{y} + \mathbf{w} = \boldsymbol{\Omega}\mathbf{s} + \boldsymbol{w} = \begin{bmatrix} \boldsymbol{\Omega}\mathbf{y} & \boldsymbol{\Omega}\mathbf{y} - \boldsymbol{\kappa} \end{bmatrix} \begin{bmatrix} \boldsymbol{s}\_{\mathbf{K}} \\ \boldsymbol{s}\_{\mathbf{N}-\mathbf{K}} \end{bmatrix} + \boldsymbol{w} = \boldsymbol{\Omega}\mathbf{y}\mathbf{s}\mathbf{K} + \boldsymbol{\Omega}\mathbf{y}\mathbf{-}\mathbf{s}\mathbf{s}\mathbf{N} + \mathbf{w} \tag{16}$$

where, <sup>Ω</sup> = ω<sup>1</sup> ··· ω*<sup>K</sup>* ω*K*+<sup>1</sup> ··· ω*<sup>N</sup>* is an *M* × *N* matrix generated of *N* vectors with *M*-dimension.

In order to estimate the error of the recovery algorithm accurately, we define three error functions using the *l*<sup>2</sup> norm:

$$\text{Original data error}: \ c\_{\mathfrak{x}} = \frac{1}{N} \|\mathfrak{x} - \mathfrak{x}^\*\|\_2^2 \tag{17}$$

$$\text{Observed data error}: \ e\_y = \frac{1}{M} \|y - y^\*\|\_2^2 \tag{18}$$

Sparse data error : *es* <sup>=</sup> <sup>1</sup> *<sup>N</sup><sup>s</sup>* <sup>−</sup> *<sup>s</sup>* ∗ 2 <sup>2</sup> (19)

where, *x*∗ , *y*∗ ,*s*∗ represent the reconstructed values of *x*, *y*,*s*, respectively. The three reconstructed values are obtained by maximum likelihood (ML) estimation using *l*<sup>0</sup> minimization. The number of iterations in the restoration algorithm is *v* times, that is, the number of correlation vectors. In addition, in the process of solving *s*∗ by using pseudo-inverse, which is based on the least-squares algorithm, the value of *v* is smaller than *M*. Using Equations (13) and (15), the expressions of *x*∗ , *y*∗ ,*s*∗ are listed as follows:

$$\begin{aligned} s^\* = \begin{bmatrix} s^\*\_{\upsilon} \\ s^\*\_{N-\upsilon} \end{bmatrix} = \begin{bmatrix} s^\*\_{\upsilon} \\ 0^\*\_{N-\upsilon} \end{bmatrix} = \begin{bmatrix} \Omega^+\_{\upsilon} \widetilde{y} \\ 0^\*\_{N-\upsilon} \end{bmatrix} = \begin{bmatrix} \Omega^+\_{\upsilon} (\Omega\_N s\_{\upsilon} + \Omega\_{N-\upsilon} s\_{N-\upsilon} + w) \\ 0^\*\_{N-\upsilon} \end{bmatrix} = \begin{bmatrix} s\_{\upsilon} + \Omega^+\_{\upsilon} (\Omega\_{N-\upsilon} s\_{N-\upsilon} + w) \\ 0^\*\_{N-\upsilon} \end{bmatrix} \end{aligned} \tag{20}$$

$$\mathbf{x}^\* = \mathbf{\varPsi} \mathbf{s}^\* \tag{21}$$

$$y^\* = \Omega \mathbf{s}^\* \tag{22}$$

where, <sup>Ω</sup><sup>+</sup> *<sup>v</sup>* is the pseudo inverse of <sup>Ω</sup>*v*, and its expression is <sup>Ω</sup><sup>+</sup> *<sup>v</sup>* <sup>=</sup> Ω*<sup>T</sup> <sup>v</sup>* Ω*<sup>v</sup>* −<sup>1</sup> Ω*<sup>T</sup> v* . Using Equations (20–22), the three error functions are rewritten as follows:

$$\varepsilon\_{\rm x} = \frac{1}{N} \left\| \left[ -\Psi\_{\rm v} \Omega\_{\rm v}^{+} (\Omega\_{N-\rm v} s\_{N-\rm v} + \boldsymbol{w}) + \Psi\_{N-\rm v} s\_{N-\rm v} \right] \right\|\_{2}^{2} \tag{23}$$

$$\mathcal{L}\_{\mathcal{Y}} = \frac{1}{M} \left\| \left[ -\Omega\_{\overline{v}} \Omega\_{\overline{v}}^{+} \left( \Omega\_{N-\overline{v}\mathcal{N}-\overline{v}} + w \right) + \Omega\_{N-\overline{v}\mathcal{N}-\overline{v}} \right] \right\|\_{2}^{2} \tag{24}$$

$$\varepsilon\_{\rm s} = \frac{1}{N} \left\| \begin{bmatrix} -\Omega\_{\upsilon}^{+} (\Omega\_{N-\upsilon} s\_{N-\upsilon} + w) \\ s\_{N-\upsilon} \end{bmatrix} \right\|\_{2}^{2} \tag{25}$$

According to the definition of Ψ, Ω and the RIP, we know:

$$
\mathfrak{e}\_s = \mathfrak{e}\_x \tag{26}
$$

$$(1 - \delta\_K)\mathbf{e}\_\mathbf{s} \le \mathbf{e}\_\mathbf{y} \le (1 + \delta\_K)\mathbf{e}\_\mathbf{s} \tag{27}$$

where, <sup>δ</sup>*<sup>K</sup>* <sup>∈</sup> (0, 1) represents a coefficient associated with <sup>Ω</sup> and K. According to Gershgorin circle theorem [33], <sup>δ</sup>*<sup>K</sup>* <sup>=</sup> (*<sup>K</sup>* <sup>−</sup> <sup>1</sup>)μ(Ω) for all *<sup>K</sup>* < μ(Ω) −1 , where μ(Ω) denotes the coherency of Ω:

$$\mu(\Omega) = \max\_{1 \le i < j \le N} \frac{\left| \left< \omega\_{i\prime} \, \omega\_{j} \right> \right|}{\|\omega\_{i}\|\_{2} \|\omega\_{j}\|\_{2}}. \tag{28}$$

Using Equations (26) and (27), the boundaries of the original data error are as follows:

$$\frac{1}{(1+\delta\_K)}\varepsilon\_y \le \varepsilon\_x \le \frac{1}{(1-\delta\_K)}\varepsilon\_y. \tag{29}$$

Therefore, from the above analysis, we can conclude that the three errors are consistent, and the minimizing of the three errors is equivalent. Due to the complexity and reliability of the calculation (*ex*-too complicated, *es*-insufficient dimensions), *ey* is used as the target in the optimization function of the recovery algorithm.

### 3.3.2. Optimal Iterative Recovery of Image in Noisy Background

The optimal iterative recovery of image discussed in this paper refers to the case where the error function of the image is the smallest, as can be seen from the minimization of *ey* in the form of *l*<sup>2</sup> norm:

$$w\_{opt} = \left\{ v \Big| \operatorname\*{argmin}\_{\upsilon} e\_{\mathcal{Y}} \right\} \tag{30}$$

$$\operatorname{argmin} e\_{\mathcal{Y}} = \operatorname{argmin} \frac{1}{M} \| \mathbf{G}\_{\upsilon} \Omega\_{\mathcal{N}-\upsilon} \mathbf{s}\_{\mathcal{N}-\upsilon} - \mathbf{C}\_{\upsilon} w \|\_{2}^{2} \tag{31}$$

$$\begin{cases} \ \mathcal{G}\_{\upsilon} = I - \Omega\_{\upsilon} \Omega\_{\upsilon}^{+} \\ \ \mathcal{C}\_{\upsilon} = \Omega\_{\upsilon} \Omega\_{\upsilon}^{+} \end{cases} \tag{32}$$

while *Gv* is a projection matrix of rank *M* − *v*, *Cv* is a projection matrix of rank *v*. Since the projection matrices *Gv* and *Cv* in Equation (30) are orthogonal, the inner product of the two vectors *Gv*Ω*N*−*vsN*−*<sup>v</sup>* and *Cvw* is zero and therefore:

$$\varepsilon\_{\rm y} = \frac{1}{M} \| \mathbf{C}\_{\rm v} \mathbf{w} \|\_{2}^{2} + \frac{1}{M} \| \mathbf{G}\_{\rm v} \boldsymbol{\Omega}\_{\rm N-v} \mathbf{s}\_{N-v} \|\_{2}^{2} = \mathfrak{e}\_{\rm y}^{\rm v} + \mathfrak{e}\_{\rm y}^{\rm s} \tag{33}$$

According to [34], the observed data error *ey* is a Chi-square random variable with degree of freedom *v*, and the expected value and the variance of *ey* are as follows:

$$\frac{M}{\sigma\_w^2} \varepsilon\_y \sim \chi\_v^2 \tag{34}$$

$$E(\varepsilon\_{\mathcal{Y}}) = \frac{\upsilon}{M} \sigma\_{\upsilon}^{2} + \frac{1}{M} \| \mathbf{G}\_{\upsilon} \boldsymbol{\Omega}\_{N-\upsilon SN-\upsilon} \|\_{2}^{2} \tag{35}$$

$$\text{var}(e\_y) = \frac{2v}{M^2} \left(\sigma\_w^2\right)^2 \tag{36}$$

The expected value of *ey* has two parts. The first part *<sup>v</sup> <sup>M</sup>* <sup>σ</sup><sup>2</sup> *<sup>w</sup>* is the noise-related part, which is a function that is positively related to the number *v*. The second part <sup>1</sup> *<sup>M</sup> Gv*Ω*N*−*vsN*−*v* 2 <sup>2</sup> is a function of the unstable micro element *sN*−*v*, which is decreased as the number *v* increases. Therefore, the observed data error *ey* is normally called a bias-variance tradeoff.

Due to the existence of the uncertain part *es <sup>y</sup>*, this results in an impossible-to-seek optimal number of iterations *vopt* by solving the minimum value of *ey* directly. As a result, another bias-variance tradeoff *e*∗ *<sup>y</sup>* is introduced to provide probabilistic bounds on the *es <sup>y</sup>* by using the noisy output '*y* instead of noiseless output *y*:

$$\varepsilon\_y^\* = \frac{1}{M} \|\overline{y} - y^\*\|\_2^2 = \frac{1}{M} \|G\_{\upsilon} \Omega\_{N-\upsilon \to N-\upsilon} + G\_{\upsilon} w\|\_2^2. \tag{37}$$

According to [35], the second observed data error *e*∗ *<sup>y</sup>* is a Chi-square random variable of order *M* − *v*, and the expected value and the variance of *e*<sup>∗</sup> *<sup>y</sup>* are as follows:

$$\frac{\mathcal{M}}{\sigma\_w^2} \varepsilon\_y^\* \sim \chi^2\_{M-v} \tag{38}$$

$$E\left(\varepsilon\_y^\*\right) = \frac{M-\upsilon}{M}\sigma\_w^2 + \frac{1}{M} \|\mathbf{G}\_\upsilon \boldsymbol{\Omega}\_{N-\upsilon} \boldsymbol{s}\_{N-\upsilon}\|\_2^2 = \frac{M-\upsilon}{M}\sigma\_w^2 + \varepsilon\_y^s \tag{39}$$

$$\mathrm{var}(e\_y^\*) = \frac{2(M-v)}{M^2} \mathrm{(}\sigma\_w^2\text{)}^2 + \frac{4\sigma\_w^2}{M^2} \|G\_v \Omega\_{N-v} \mathbf{s}\_{N-v} \|\_2^2 \tag{40}$$

So, we can derive probabilistic bounds for the observed data error *ey* using probability distribution of the two Chi-square random variables:

$$
\underline{e\_y(p\_1, p\_2)} \le \underline{e\_y} \le \overline{e\_y(p\_1, p\_2)}\tag{41}
$$

where, *p*<sup>1</sup> is the confidence probability on a random variable of the observed data error *ey*, and *p*<sup>2</sup> is the validation probability on a random variable of the second observed data error *e*∗ *<sup>y</sup>*. As both of the two errors satisfy the Chi-square distribution, Gaussian distribution can be used to estimate them. Therefore, confidence probability *p*<sup>1</sup> and validation probability *p*<sup>2</sup> can be calculated as:

$$p\_1 = Q(a) = \int\_{-a}^{a} \frac{1}{\sqrt{2\pi}} e^{\frac{-x^2}{2}} dx \tag{42}$$

$$p\_2 = Q(\beta) = \int\_{-\beta}^{\beta} \frac{1}{\sqrt{2\pi}} e^{\frac{-\beta^2}{2}} dx \tag{43}$$

where, α and β denote the tuning parameters of confidence and validation probabilities, respectively. Furthermore, the worst case is considered when calculating the minimum value of *ey*, that is, by calculating the minimum value of the upper bound of *ey*:

$$\begin{split} v\_{\text{opt}} &= \left\langle v \middle| \operatorname\*{arg\,min}\_{\boldsymbol{v}} \overline{c\_{\mathcal{Y}}} \right\rangle = \left\langle v \middle| \operatorname\*{arg\,min}\_{\boldsymbol{v}} \overline{c\_{\mathcal{Y}}(p\_1, p\_2)} \right\rangle = \left\langle v \middle| \operatorname\*{arg\,min}\_{\boldsymbol{v}} \overline{c\_{\mathcal{Y}}(\boldsymbol{\alpha}, \boldsymbol{\beta})} \right\rangle \\ &= \left\langle v \middle| \operatorname\*{arg\,min}\_{\boldsymbol{v}} \left( \frac{2p - M}{M} \boldsymbol{\sigma}\_{\boldsymbol{w}}^{2} + \boldsymbol{\epsilon}\_{\mathcal{Y}}^{\*} + \boldsymbol{\alpha} \frac{\sqrt{2}v}{M} \boldsymbol{\sigma}\_{\boldsymbol{w}}^{2} + \beta \text{var}\{\boldsymbol{\epsilon}\_{\mathcal{Y}}^{\*}\} \right) \right\rangle \end{split} \tag{44}$$

Normally, based on Akaike information criterion (AIC) or Bayesian information criterion (BIC), the optimum number of iterations can be chosen as follows:

AIC: Set α = β = 0

$$\overline{\varepsilon\_{\mathcal{Y}}} = \overline{\varepsilon\_{\mathcal{Y}}(\alpha = 0, \beta = 0)} = (\frac{2v}{M} - 1)\sigma\_w^2 + \varepsilon\_y^\* \tag{45}$$

BIC: Set <sup>α</sup> <sup>=</sup> <sup>√</sup> *v* log *M* and β = 0.

$$\overline{c\_{y}} = \overline{c\_{y} \left(\alpha = \sqrt{v} \log M, \beta = 0\right)} = \left(\frac{(2 + \sqrt{2}\log M)v}{M} - 1\right)\sigma\_{w}^{2} + e\_{y}^{\*}\tag{46}$$

where, *e*∗ *<sup>y</sup>* can be calculated based on the noisy observation data and the reconstruction algorithm.

### 3.3.3. Application of Error Estimation on BCS

The proposed algorithm (FE-ABCS) is based on block-compressive sensing, so the optimal number of iterations (*vopt*) derived in the above section also requires a variant to be applied to the above algorithm:

$$\begin{split} \boldsymbol{\sigma}\_{\text{opt}}^{i} &= \left\{ \boldsymbol{v}^{i} \Big| \operatorname\*{arg\,min}\_{\boldsymbol{v}^{i}} \overline{\boldsymbol{e}\_{\boldsymbol{y}^{i}}} \right\} = \left\{ \boldsymbol{v}^{i} \Big| \operatorname\*{arg\,min}\_{\boldsymbol{v}^{i}} \overline{\boldsymbol{e}\_{\boldsymbol{y}^{i}}(\boldsymbol{p}^{i}\_{1}, \boldsymbol{p}^{i}\_{2})} \right\} = \left\{ \boldsymbol{v}^{i} \Big| \operatorname\*{arg\,min}\_{\boldsymbol{v}^{i}} \overline{\boldsymbol{e}\_{\boldsymbol{y}^{i}}(\boldsymbol{a}\_{i}, \boldsymbol{\beta}\_{i})} \right\} \\ &= \left\{ \boldsymbol{v}^{i} \Big| \operatorname\*{arg\,min}\_{\boldsymbol{v}^{i}} \Big( \frac{2\boldsymbol{v}^{i} - \boldsymbol{m}^{i} + \boldsymbol{a}\_{i} \cdot \overline{\boldsymbol{2} \boldsymbol{v}^{i}}}{\boldsymbol{m}^{i}} \boldsymbol{o}^{2}\_{\boldsymbol{w}^{i}} + \beta \operatorname\*{var}(\boldsymbol{e}^{\*}\_{\boldsymbol{y}^{i}}) + \boldsymbol{e}^{\*}\_{\boldsymbol{y}\_{i}} \Big) \right\} \end{split} \tag{47}$$

where, *<sup>i</sup>* <sup>=</sup> 1, 2, ··· , *<sup>T</sup>*<sup>1</sup> represents the serial number of all sub-images. Similarly, the values of <sup>α</sup>*<sup>i</sup>* and <sup>β</sup>*<sup>i</sup>* can be valued according to the AIC and BIC criteria.

### **4. The Proposed Algorithm (FE-ABCS)**

With the knowledge presented in the previous section, the altered ABCS (FE-ABCS) is proposed for the recovery of block sparse signals in noiseless or noise backgrounds. The workflow of the proposed algorithm is presented in Section 4.1 while the specific parameter settings of the proposed algorithm is introduced in Section 4.2.

### *4.1. The Workflow and Pseudocode of FE-ABCS*

In order to better express the idea of the proposed algorithm, the workflow of the typical BCS algorithm and the FE-ABCS algorithm are presented, as shown in Figure 2.

**Figure 2.** The workflow of two block-compressive sensing (BCS) algorithms. (**a**) Typical BCS algorithm, (**b**) FE-ABCS algorithm.

According to Figure 2, compared with the traditional BCS algorithm, the main innovations of this paper can be reflected in the following points:

• Flexible partitioning: using the weighted MIE as the block basis to reduce the average complexity of the sub-images from the pixel domain and the spatial domain;


Furthermore, since the FE-ABCS algorithm is based on iterative recovery algorithm, especially the non-convex optimization iterative recovery algorithm, this paper uses the OMP algorithm as the basic comparison algorithm without loss of generality. The full pseudocode of the proposed algorithm is presented as follows.


### *4.2. Specific Parameter Setting of FE-ABCS*
