*4.3. The Full Tier 1 Coder*

By chaining together both the BPC and MQ-coder, the tier 1 coder for JPEG2000 is formed. The basic segmentation is two stages for the BPC coder and four for the MQ-coder. Joining the different stages are multiple FIFOs. These help maintain a constant flow of data:


The full pipeline, taking into account the different queues, has a total of 15 stages, as seen in Figure 12. Despite the amount of stages, this has negligible impact in the final speed, since the coding of a full 64 × 64 × 16 block takes a minimum of 1024 · 3 · 14 + 1024 = 44, 032 cycles, so filling the pipeline takes at most 15/44, 032 · 100 = 0.03% of the total cycles.

**Figure 12.** Detailed pipeline of the tier 1 coder. In dotted orange, the separation between stages. Each FIFO introduces two stages (read/write).

#### **5. Results**

The hardware architecture described in Section 4 has been implemented using VHDL language for the specification of the tier 1 coder. Moreover, we have used the 2020 Xilinx Vivado Design Suite environment to specify the complete system. The full system has been implemented on a VC709 board, a reconfigurable board with a single Virtex-7 XC7VX690T, two DDR3 SDRAM DIMM slots which hold up to 4 GB each, a RS232 port, and some additional components not used by our implementation. The HDLmodel has been verified via simulation and physical prototyping using a memory controller for input/output.

Table 1 shows the frequency and FPGA slice occupancy for the full tier 1 coder and its modules and submodules. More details are given in the following list:


**Table 1.** Frequency and occupancy for the different modules that make the full tier 1 coder. Results are for the Virtex-7 XC7VX690T board with a depth of 32 set for all queues.


All in all, the full tier 1 coder is able to work at 255 MHz. At that speed, the bottleneck is the number of CxD pairs processed by the MQ-coder at 255 MCxD/s. By studying how many CxD pairs are produced,the input speed is calculated:


Thus, the input rate to generate 255 MS/s would range from 1.01 Gb/s to 247.3 Mb/s. However, the BPC-core is only capable of processing 380 Mb/s, so in practice this range is limited to 247.3 to 380 Mb/s.

The exact value within this range of course depends on the redundancy of the data. For [31], they compressed five images of size 512 × 512 × 10 and noted that the average *p*/*b* rate was 0.56. This means that, on average, the input rate for 255 MS/s would be 455 Mb/s. Thus it is safe to say that the tier 1 coder will consistently perform at its 380 Mb/s limit.

#### *5.1. Comparison*

A comparison with other implementations can be seen in Table 2. Only the best implementations found in the literature have been taken into account.

As seen, this implementation of the BPC works more than four times faster than other FPGA implementations, surpassing even ASIC designs in throughput.

With regard to the MQ-coder, this design doubles the performance of previous FPGA designs, only falling short of 0.18 μm CMOS. It was expected that porting the design to this technology would make it faster than the competition, since other implementations have experienced [32] a speedup of 4× when doing so.


**Table 2.** Comparison with other implementations.

\* Although not specified, the architecture is similar to the one presented here so a similar relationship between frequency and speed was expected. \*\* Requires external memory for data and/or internal variables.

### *5.2. Acceleration of JYPEC*

To see its impact on hyperspectral image compression under JYPEC, six images from two libraries have been compressed by JYPEC with and without acceleration. Four from the Spectir [55] library and two from the CCSDS 123 test data set [56]. The image characteristics are seen in Table 3 and a preview in Figure 13.


**Table 3.** Images used for testing.

**Figure 13.** Small cutouts of the images from Table 3. In reading order: CUP, SUW, DHO, BEL, REN, CRW.

The results have been acquired in a DELL XPS 13 9360 computer, with an i7-7500U processor with a thermal design power (TDP) of 15 W, 8 GB of RAM running at 1866 MHz, and 256 GB of SSD PCIe storage. For the accelerated version, the time of coding in the processor was replaced with the time of coding in the FPGA itself. Memory transfer times ertr not taken into account, because the PCIe of the VC709 board works at 25 GB/s and the typical image size is 500 MB, so it was transferred in 20 ms, not impacting results.

The speedup attained is shown in Figure 14. The average speedup obtained was 3.6, ranging from 1.6 for the DHOimage to 7.5 for the CRWimage.

**Figure 14.** Speedup when using an FPGA as an accelerator. For each image, the top bars indicate the sped-up version, and the bottom bars are the non-sped-up one. A dashed bar indicates the real-time threshold, which without acceleration was only met by the DHO image.

The code for the software JYPEC implementation can be accessed in [57], and the accelerator code was uploaded in [58].

#### **6. Conclusions**

JYPEC is a complex algorithm that demands high-performing hardware for a fast execution in real time. The most costly part is the tier 1 coder within JPEG2000, since code with erratic branching is very hard to optimize for traditional processors.

Very simple arithmetic and logic operations, however, make this part of the algorithm ideal for execution on a FPGA. A very fast architecture for the full tier 1 coder within JPEG2000 has been developed based on two main ideas:


The presented design doubles the speed of any previous design on FPGA, coming close in performance to 0.18 μm CMOS technology in single-core tests.

In the context of hyperspectral imaging, it brings complex lossy compression to real-time performance under the AVIRIS-ng sensor threshold (30–72 MS/s for a total of 491.52 Mb/s). This allows for very high data rates to be reduced for long-term storage on-the-fly, while keeping great quality for posterior analyses.

**Author Contributions:** D.B. designed the algorithm; C.G. and D.M. conceived and designed the experiments; D.B. performed the experiments; D.B., C.G. and D.M. analyzed the data; D.B., C.G. and D.M. wrote the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been supported by the by the Spanish MINECO projects TIN2013-40968-P and TIN2017-87237-P.

**Acknowledgments:** The authors would like to thank the anonymous reviewers. Their comments and suggestions greatly improved this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
