**1. Introduction**

Hyperspectral remote sensing imaging acquires three-dimensional (3D) data including two spatial dimensions with space information of pixels and one spectral dimension with high-dimensional reflectance vectors [1]. The rich spectral information provided by HSI is very useful and has been widely used in a range of various applications such as ecology [2], agriculture [3], environmental [4] and geology [5], where target detection plays a crucial role [6–9]. There are many algorithms have been developed for target detection in HSI [1], such as matched filter (MF) [10], spectral angle mapper (SAM) [11], constrained energy minimization (CEM) [12], target-constrained interference-minimized filter (TCIMF) [13], adaptive coherence estimator (ACE) [14], matched subspace detector (MSD) [15], orthogonal subspace projection (OSP) [16], and sparsity-based target detector (STD) [17]. Among them, CEM along with its variants have been widely used for hyperspectral target detection. The effectiveness of CEM has been shown successfully in many applications such as reconnaissance, rescue, search and on-orbit processing [6]. For such applications, the high-speed data processing of HSI is generally required for finding targets on a timely basis. However, as a trade-off the volume of data in HSI has also become unmanageable with increasing spectral and spatial resolutions. In this case, CEM needs a significantly large number of complex matrix computations. The algorithm could be implemented on one of the widely used platforms like CPUs, GPUs [18], and FPGAs [19]. Among them, FPGAs have significant advantages in supporting parallel computation and customizable deep pipeline operation with the cost of low power consumption.

Currently, grea<sup>t</sup> progress has been made in implementing target detection algorithms on FPGAs. For example, Chang described a new FPGA design by using the Coordinate Rotation Digital Computer (CORDIC) algorithm to solve the matrix inversion problem of the classical CEM [20]. This method of computing the inverse of a large matrix is unable to support fast target detection. Yang utilized Streaming Background Statistics (SBS) structure with an idea of continuously updating the inverse of the correlation matrix on FPGA [21]. Despite that a pixel-by-pixel processing design is realized using fewer hardware resources, its data processing speed is not high. Recently, Gonzalez C. et al. proposed an FPGA implementation of the automatic target-generation process based on an orthogonal subspace projector (ATGP-OSP) using the pseudoinverse operation [22], where Gauss-Jordan elimination method was selected for computing the inverse of a small square matrix whose size is no more than 32 × 32. Unfortunately, the Gauss-Jordan elimination method consumes too much logic resources for solving the large matrix inversion problem required by the CEM.

Although FPGAs gain much attention, it is still not widely deployed for accelerating many algorithms that require high computational complexity such as CEM. The main reason is that the conventional development methods of FPGAs, which are based on register transfer level (RTL) hardware description, are much more difficult than that of CPUs or GPUs. It commonly requires grea<sup>t</sup> efforts in achieving highly efficient results on FPGAs. Furthermore, a design method based on RTL for FPGAs lacks portability and flexibility compared to those based on C/C++ for CPUs or GPUs. To close this gap, FPGA vendors and developers have begun to take advantage of high-level synthesis (HLS) to work on FPGA applications. HLS is able to convert high abstraction languages such as C, C++ and SystemC into VHDL/Verilog hardware description language (HDL) for RTL-level circuit design. According to the user-defined constraints and C code style, the efficiency of the converted RTL designs are quite different. Until now, studies on the acceleration of hyperspectral data processing algorithms with HLS are already available. Santos proposed a novel adaptive and predictive algorithm for lossy hyperspectral image compression algorithm [23] and Lossy Compression for Exomars (LCE) algorithm [24] described in Vivado HLS. Domingo R. et al. proposed a hyperspectral image spatial-spectral classifier accelerator using Intel FPGA SDK for OpenCL [25]. What's more, HLS is popularly utilized for hardware acceleration of deep learning algorithms like convolution neural networks (CNN) [26,27]. However, no research work has been reported for implementing CEM on FPGA using HLS.

In this work, a DPBS-CEM algorithm is developed to be implemented on FPGA using HLS for real-time hyperspectral target detection. Like SBS-CEM, the inverse matrix is gradually updated according to a Sherman-Morrison formula [28]. Different from using sliding windows, DPBS-CEM takes advantage of cumulative windows instead to greatly reduce the number of calculations. As for the issue of removing data dependency in updating inverse matrices, separate memories are proposed to store the results of the successive inverse matrices, which make sure the operations on adjacent pixels can be processed independently. As a consequence, a deep pipelined implementation of DPBS-CEM can be further developed, which has an extraordinary performance improvement in terms of data throughput. Experimental results demonstrate that the proposed algorithm has the capability of operating at a high-speed rate of more than 200 MHz on FPGA. Setting the same clock frequency, the algorithm can also achieve a significant speed-up of near 7.3× than SBS-CEM [21] with no compromise for detection accuracy.

The contributions of this paper can be summarized as follows.


The remainder of this paper is organized as follows. Section 2 briefly discusses CEM and SBS-CEM used for target detection. Section 3 describes the principle of DPBS-CEM in grea<sup>t</sup> detail. The FPGA implementation of DPBS-CEM is presented in Section 4. Section 5 conducts a detailed performance analysis via extensive experiments. Finally, conclusions along with some remarks were drawn in Section 6.
