4.3.2. High Fanout

Register duplication is one of the most common ways to solve high fanout violation. It can be applied to relieve the fanout challenge as described in Section 3.2. However, the difficulty is how to make HLS replicate registers automatically since there is no inherent support of such feature in HLS. To solve this problem, we modify part of C/C++ code in HLS to split the high fanout task into two or more identical subtasks, which allows HLS to generate duplicated circuits for reducing the fanout. With this optimization, our FPGA implementation is able to work at a rate of speed higher than 200 MHz.

## *4.4. Specific Features*

#### 4.4.1. Scalability and Portability

Parallel computation and memory units are placed in the stages of the core architecture of DPBS-CEM to accelerate the related operations of matrix multiplication. The number of the parallel units is equal to the value *L*. By modifying the value *L*, we can easily scale the core framework of DPBS-CEM with HLS to support different HSIs with different number of bands. Parameter customized design method with HLS greatly improves the scalability of the system. Simultaneously, the framework does not rely on any specific underlying physical devices of FPGA and vendor-provided IP cores. Thus it can be easily ported to other types of FPGAs.
