*2.2. Determination of Sub-Region Weights*

The traditional SPM models use all the extracted local descriptors for image representation; therefore, some of them might not be the objects to be recognized. Thus, it was required to learn the spatial weights to boost the target part and weaken the background parts. Taking the sparse representation into account, the observation to be followed was that the target part was sparser than the background one. The feature vectors of the subregions at every pyramid level were weighted on the residual basis to determine their dependability [21]. Then, the following optimization problem [10] was obtained:

$$\mathfrak{X}\_{k} = \operatorname{argmin}\_{\mathbf{x}\_{k}} \left\{ ||F\_{k}\mathbf{x}\_{k} - \mathcal{y}\_{k}||\_{2}^{2} + \lambda ||\mathbf{x}\_{k}||\_{1} \right\} \text{ s.t. } 1 \le k \le K \tag{3}$$

where *yk* is the pooling feature for each sub-region, and *Fk* is the dictionary.

The residuals of a sub-region can be measured simply by using the L2-norm as follows:

$$r\_{k\varepsilon} = ||F\_k \delta\_{\varepsilon}(\hat{x}\_k) - y\_k||\_{\mathcal{D}'} \tag{4}$$

where *rkc* is the residual corresponding to the *c*th class of the *k*th sub-region, *δc*(*x*ˆ*k*) is the discriminative function, which selects the elements related with the *c*th class in *x*ˆ*k*, *c* = [1, 2, ··· , *C*], and *C* is the class number of the training dataset.

All the residuals of the *k*th sub-region were concatenated to generate the residual vector *rk* as follows:

$$r\_k = [r\_{k1}, r\_{k2}, \dots, r\_{\prime}, r\_{k\mathbb{C}}].\tag{5}$$

According to the theory of SR [10], target sub-regions can be accurately represented only by the training samples from the same class. Therefore, the residual vector produced contains only one small element. The background sub-regions could be far away from every subspace spanned by the training samples of each class but nearly within the subspace spanned by all training samples of all classes. The residual vector produced could include almost similar elements. According to the abovementioned analysis, the following function was applied for evaluating the sub-region sparsity:

$$
\zeta\_k = \frac{\min(r\_k)}{\text{mean}(r\_k)}.\tag{6}
$$

When *rk* has only a single zero or near zero residual, *ς<sup>k</sup>* reaches the minimum value 0. This shows that the sub-region could be well represented by its subspace. When all residuals in *rk* are nonzero value and equal, *ς<sup>k</sup>* reaches the maximum value 1. This indicates that the sub-region could contain noise or it could be the background part.

This is verified by the numerical value results shown in Figure 2. Figure 2a,b are example images, which illustrate the distribution of sub-regions in the construction of a three-level pyramid. Figure 2c,d show *ς<sup>k</sup>* of the sub-regions corresponding to Figure 2a,b respectively. The obtained residuals indicated that the residuals' sparsity *ς<sup>k</sup>* generated in the sub-regions located in the target parts (such as the sub-regions marked by 11 and 12 in Figure 2a) was smaller than the background parts (such as the sub-regions marked by 9 and 10 in Figure 2a) in the third-level pyramid. Additionally, a difference existed between the residual values in the different resolutions, such that the residuals were much greater in the low resolution level than in the high ones, in particular, the target parts. The reason could be that more background cluster and speckle noise was included in a larger region. Therefore, in order to differentiate the target sub-regions from the background ones, the residual could be employed to obtain the weighted feature vector.

To tackle this problem, a simple way was to impair the feature representation of the background sub-regions for the classification. This was achieved by weighting all sub-regions with the reciprocal of the sparsity: *ωk*, which is viewed as the dependability:

$$
\omega\_k = \frac{1}{\mathcal{G}\_k} \tag{7}
$$

**Figure 2.** Illustration of the numeric value results of the sub-region sparsity. (**a**,**b**) are example images illustrating the distribution of sub-regions in the construction of the three-level pyramid. (**c**,**d**) show the *ς<sup>k</sup>* of the sub-regions corresponding to the images (**a**,**b**).

#### *2.3. Weighted SPM Sparse Representation*

After weighting the sub-regions, the weighted sub-regions were integrated to reconstruct a global feature vector according to the SPM model. Each sub-region vector *yk* was weighted by a corresponding weight to be represented as a sub-region weighted query vector:

$$y\_k^{\omega} = \omega\_k y\_k. \tag{8}$$

We connected all sub-regions' weighted query vectors in series to produce a weighted feature vector as

$$y^{\omega} = \left[ y\_1^{\omega T}, y\_2^{\omega T}, \dots, y\_K^{\omega T} \right]^T. \tag{9}$$

Consider that *y<sup>ω</sup>* can be represented by *F* in a linear manner as follows:

$$y^{\omega} = F u^{\omega}.\tag{10}$$

To obtain the primary label information from the training dataset matrix and weighted query feature vector, the SR classification method [13] was applied to complete the target image classification:

$$\hat{\mu}^{\omega} = \operatorname{argmin}\_{\mu^{\omega}} \left\{ ||F\mu^{\omega} - y^{\omega}||\_2^2 + \lambda ||\mu^{\omega}||\_1 \right\}. \tag{11}$$

Linear approximation [15] is capable of selecting significant training vectors to provide a good discrimination among all the classes.
