2.2.2. Problem Analysis

**Sliding window problem.** Compared to the classical CEM algorithm, SBS-CEM does not need the full image data sample vectors to compute the correlation matrix. Instead, a local region of the image defined by a sliding window is utilized to capture the local statistics. The size of the sliding window is fixed and set to *L*<sup>2</sup> (square of the number of spectral bands) in the SBS-CEM algorithm. The fixed size of the sliding window requires the compute-intensive task of calculating the Sherman-Morrison formula

to be performed twice (as shown in Equations (7) and (8)) for each update of the inverse matrix. However, the extra calculation of the Equation (8) does not provide appreciable improvements of the target detection accuracy according to our experimental results.

**Data dependency problem.** The problem of data dependency exists in the process of updating the inverse matrix where the calculation for **S**−<sup>1</sup> *n*+1 cannot be started until the **S**−<sup>1</sup> *n* is available. Unless **S**−<sup>1</sup> *n* is ready, it is not possible to compute **S**−<sup>1</sup> *n*+1. SBS-CEM divides the process of updating the inverse matrix into several stages to reduce its complexity. Unfortunately, under such circumstance, several stages' time consumption has to spend waiting for each inverse matrix updating. This computation overhead would be the major bottleneck of the SBS-CEM's data throughput performance.

## **3. Algorithm Optimization**

#### *3.1. Principle of Algorithm Optimization*

To solve the problems described above for SBS-CEM, an optimized algorithm is proposed in this section. The two main improvements are proposed to deal with the use of sliding windows and to remove data dependency.

**Non-sliding window.** We choose not to use sliding windows to update calculations of the inverse matrix, which is quite different from the SBS-CEM algorithm. With no requirement for moving out the oldest pixel, the Equation (8) can be removed and thus a large number of calculations can be therefore reduced. When a new pixel vector **x***n* is loaded into the window, we can obtain the output value **S**−<sup>1</sup> *n*+1 by Equation (12).

$$\mathbf{S}\_{n+1}^{-1} = \left(\mathbf{S}\_{\text{ll}} + \mathbf{x}\_{\text{ll}} \mathbf{x}\_{\text{n}}^{\text{T}}\right)^{-1} = \mathbf{S}\_{\text{n}}^{-1} - \frac{\mathbf{S}\_{\text{n}}^{-1} \mathbf{x}\_{\text{n}} \mathbf{x}\_{\text{n}}^{\text{T}} \mathbf{S}\_{\text{n}}^{-1}}{\mathbf{x}\_{\text{n}}^{\text{T}} \mathbf{S}\_{\text{n}}^{-1} \mathbf{x}\_{\text{n}} + 1} \tag{12}$$

**Data segmentation for deep pipeline.** As mentioned above, the SBS-CEM algorithm runs calculations of matrix inversions in serial. Since data dependency exists between **S**−<sup>1</sup> *n*+1 and **S**−<sup>1</sup> *n* , there is a grea<sup>t</sup> increase in processing time. To solve this problem, we need to complete the computation of Equation (12) in four stages and apply pipeline optimization for achieving pipeline acceleration. However, updating the inverse matrix between adjacent pixels is not independent, which prevents the use of the optimization strategy of deep pipeline. If we want to achieve a deep pipelined design, we have to make sure there is no feedback or iterations among the stages. In this case, we solve the data dependency by means of data segmentation. As a result, the current input pixel can be processed directly with no need of waiting for the previous pixel to be completed. By making the inverse calculations between neighbouring pixels independent, we are able to carry out a deep pipelined architecture, which can achieve 8× speed-up compared to SBS-CEM in theory.

Table 1, derived from the evaluation of hardware calculation, shows that the number of computations for each stage is different, but the number of clock cycles consumed by each stage is approximately equal after being parallelized. Where **x***n* **P** = **<sup>x</sup>**T*n* represents a column of **X**, *T* and *Q* are scalars. **S**−<sup>1</sup> *n* is denoted by **U**, which is an *L*-dimensional matrix. In addition, the detail procedure of DPBS-CEM algorithm is shown as Algorithm 1.


**Table 1.** Four stages of the inverse matrix update.

**Algorithm 1** The deep pipelined background statistics (DPBS) target detection CEM algorithm

**Input:** Initialize the following parameters.

(1) HSI data size: *W* × *H* × *L* = *N* × *L*;

(2) the value of *β*;

(3) the desired signature **d**;

(4) the number of inverse matrices: *M* =

(5) *bn* indicates the index of number;

4; (6) *K* indicates the number of pixel vectors collected before starting target detection; 

**Output:** the final target detection results.

define an initial inverse matrix *S*−<sup>1</sup> 0 : *S*−<sup>1</sup> 0 = *β* · **I**

data segmentation:

**for** *i* = 1 ; *i* ≤ *N* + *K* ; *i* + + **do**

> *bn* = *i* % *M*

calculate the inverse matrix:

**if** *i* ≤ *N* **then**

$$\left(\mathbf{S}^{-1}\right)^{\mathrm{bn}} = \left(\mathbf{S}^{\mathrm{bn}} + \mathbf{x}\_{i}\mathbf{x}\_{i}^{\mathrm{T}}\right)^{-1} = \left(\mathbf{S}^{-1}\right)^{\mathrm{bn}} - \frac{\left(\mathbf{S}^{-1}\right)^{\mathrm{bn}}\mathbf{x}\_{i}\mathbf{x}\_{i}^{\mathrm{T}}\left(\mathbf{S}^{-1}\right)^{\mathrm{bn}}}{\mathbf{x}\_{i}^{\mathrm{T}}\left(\mathbf{S}^{-1}\right)^{\mathrm{bn}}\mathbf{x}\_{i} + 1}$$

**endif**

calculate the target detection results: **if***i*≥*K***then**

 *DPBS* − *CEM* (**<sup>x</sup>***i*−*<sup>K</sup>*) = **<sup>x</sup>**T*i*−*<sup>K</sup>*(**<sup>S</sup>**−<sup>1</sup>)bn**<sup>d</sup><sup>d</sup>**<sup>T</sup>(**<sup>S</sup>**−<sup>1</sup>)bn**<sup>d</sup>**

```
endifendfor
```