*5.1. The Module A*

The module **A** is to acquire an *L*/2-point sequence *a*2*k*−1(*n*) + *a*2*k*(*n*) } according to Equations (3) and (13) in every clock cycle. It includes *L* + 1 sub-modules *Ak* (*k* = 0, 1, 2, ... , *L*) that firstly count { *f*(*n* + *i*) } to generate corresponding { *ak*(*n*) }, and then sum up the two adjacent *ak*(*n*) to obtain *a*2*k*−1(*n*) + *a*2*k*(*n*) . We assume the execution time of the module **A** is *TA* clock cycles. The *N*-point *f*(*n* + *i*) should be inputted into the sub-modules { *Ak* } in a gradual way.

Since the correlation kernel { *g*(*i*) } is so invariable that the computational strategy for Equations (3) and (13) are known in advance, we could simplify the structure of *Ak* for less adder and data transfer. For example, for *N* = 4, *L* = 4 and { *g*(*i*) } = { 1, 2, 3, 4 }, the module **A** could be simplified as shown in Figure 7 with 2 adder and *TA* = 1. However, for *N* = 4, *L* = 4 and { *g*(*i*) } = { 2, 1, 4, 2 }, the module A would be re-designed as shown in Figure 8 with 2 adder, 3 latches and *TA* = log24 = 2. Therefore, the structure of the module **A** should be not fixed, but changed with different sequences { *g*(*i*) } to reduce its hardware complexity. We also show the module **A** using maximum adders when { *g*(*i*) } = { 4, 4, 4, 4 } in Figure 9a, and the module **A** using 0 adders when { *g*(*i*) } = { 2, 4, 6, 8 } in Figure 9b. From Figures 7–9, it can be obtained the adder number of the module **A** is from 0 to *N* − 1, and the latency *TA* is from 0 to log2*N*.

**Figure 7.** The module **A** for { *g*(*i*) } = {1, 2, 3, 4}.

**Figure 8.** The module **A** for { *g*(*i*) } = {2, 1, 4, 2}.

**Figure 9.** The module **A** using different adders: (**a**) { *g*(*i*) } = {4, 4, 4, 4}; (**b**) { *g*(*i*) } = {2, 4, 6, 8}.
