On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two

Raciborski, Mateusz; Cariow, Aleksandr

doi:10.3390/electronics11091342

Open AccessArticle

On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two

by

Mateusz Raciborski

^*,†

and

Aleksandr Cariow

^†

Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, Żołnierska 52, 71-210 Szczecin, Poland

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2022, 11(9), 1342; https://doi.org/10.3390/electronics11091342

Submission received: 16 March 2022 / Revised: 11 April 2022 / Accepted: 20 April 2022 / Published: 23 April 2022

(This article belongs to the Special Issue Efficient Algorithms and Architectures for DSP Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Winograd’s algorithms are an effective tool for calculating the discrete Fourier transform (DFT). These algorithms described in well-known articles are traditionally represented either with the help of sets of recurrent relations or with the help of products of sparse matrices obtained on the basis of various methods of the DFT matrix factorization. Unfortunately, in the mentioned papers, it is not shown how the described relations were obtained or how the presented factorizations were found. In this paper, we use a simple, understandable and fairly unified approach to the derivation of the Winograd-type DFT algorithms for the cases N = 8, N = 16 and N = 32. It is easy to verify that algorithms for other lengths of sequences that are powers of two can be synthesized in a similar way.

Keywords:

complexity theory; compression algorithms; digital signal processing; discrete Fourier transforms; fast Fourier transforms; matrix decomposition; signal processing algorithms; sparse matrices; sum product algorithm; Winograd discrete Fourier transform algorithm

1. Introduction

Winograd’s method for the realization of the discrete Fourier transform (DFT) for several decades has been discussed in a number of publications [1,2,3,4,5,6,7,8,9,10,11,12]. In comparison with the Cooley–Tukey fast Fourier transform (FFT) algorithms, the Winograd DFT algorithm requires substantially fewer multiplications at the cost of a few extra additions. In the known papers, the cases of the Winograd FFTs for small sequences of odd length are mainly considered. Moreover, the algorithms were presented in the form of algebraic relations or in the form of DFT matrix factorizations. However, none of the publications known to us has written on how these relations were obtained or how, on the basis of any considerations, the matrices that make up the corresponding computational procedures were constructed.

In this paper, we want to show a simple, understandable and fairly unified approach to the derivation of Winograd-type FFT algorithms for the cases N = 8, N = 16 and N = 32. It is easy to verify that algorithms for other lengths of sequences that are powers of two can be synthesized similarly.

2. Preliminary Remarks

The discrete Fourier transform (DFT) is one of the most important tools in digital signal and image processing. The DFT can be defined as follows:

y_{k} = \sum_{n = 0}^{N - 1} x_{n} e^{\frac{- j 2 π n k}{N}}

(1)

where

x_{n}, n = 0, 1, \dots, N - 1

is a uniformly sampled sequence,

y_{k}, k = 0, 1, \dots, N - 1

is the k-th DFT coefficient, and

j = \sqrt{- 1}

is an imaginary unit.

In vector–matrix notation, we can rewrite (1) in the following form:

Y_{N \times 1} = E_{N} X_{N \times 1}

(2)

where

X_{N \times 1} = {[x_{0}, x_{1}, \dots, x_{N - 1}]}^{T}, Y_{N \times 1} = {[y_{0}, y_{1}, \dots, y_{N - 1}]}^{T}

and

E_{N} = ∣ ∣ w^{k n} ∣ ∣, w^{k n} = e^{\frac{- j 2 π n k}{N}}, k, n = 0, 1, \dots, N - 1

Implementation of calculations in accordance with expression (2), especially for large N, requires performing a large number of arithmetic operations, which in turn leads to an increase in computation time.

In 1965, J. Cooley and J. Tukey proposed the fast algorithm to compute discrete Fourier transform with a drastically reduced number of arithmetical operations. Mathematically, the fast Fourier transform algorithms are based on factorization of the Fourier matrix into a product of sparse matrices, meaning matrices with many zero entries. However, this factorization can be implemented in different ways. In the case of the Cooley–Tukey algorithm, we are dealing with the representation of the original matrix as a product of

{log}_{2} N

sparse structured matrices. As is well known, the complexity of this algorithm is approximately

\frac{N}{2} {log}_{2} N

multiplications and the same number of additions of complex numbers.

Another effective algorithm for calculating the DFT is the Winograd FFT algorithm. In comparison with the Cooley–Tukey FFT algorithm, the Winograd FFT algorithm requires substantially fewer multiplications at the cost of a few extra additions. Winograd proved that the multiplicative complexity of FFT algorithms can be significantly reduced by some increase in additive complexity. The Winograd Fourier transform algorithm (WFTA) is an FFT algorithm which achieves a reduction on the number of multiplications from order

O (N 2)

in the DFT to order N. In the literature known to the authors [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], the Winograd FFT algorithms that implement DFT transform for a limited set of small-length sequences are mainly considered. As a rule, these algorithms are represented as a set of algebraic relations [1,2,3,4,5,8], although the matrix interpretation of Winograd FFT algorithms are available too [6,9]. In the case of the matrix formulation of the Winograd FFT algorithms, the factorization of the DFT matrix differs from the factorization of the DFT matrix in the Cooley–Tukey FFT algorithms. Moreover, the mechanism for deriving such algorithms for each specific case is unique. In addition, methods for the deriving of recurrent relations are not published anywhere. Additionally, the ways for deriving factorized representations of DFT matrices have never been explained. In this paper, we show a simple, understandable and fairly unified approach to the derivation of the Winograd-like FFT algorithms for the case when the input sequence length is a power of two.

3. Short Background

The main idea of the proposed approach is to use a new method for factorizing the DFT matrix, which is different from Winograd factorization. In contrast to Winograd factorization, we propose the following unified method of DFT matrix decomposition:

E_{2^{i + 1}} = (H_{2} \otimes I_{2^{i}}) (E_{2^{i}} \oplus Q_{2^{i}}) P_{2^{i + 1}}^{(π_{2^{i + 1}})}

(3)

where

P_{2^{i + 1}}^{(π_{2^{i + 1}})} = [\frac{I_{2^{i - 1}} \otimes Ψ_{2 \times 4}}{(I_{2^{i - 1}} \otimes Ψ_{2 \times 4}) I_{2^{i + 1}}^{(1 \to)}}], Ψ_{2 \times 4} = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{matrix}], i = 1, 2, 3 \dots

E_{k}

is k × k DFT matrix;

Q_{k}

is some “prefix” matrix containing a constellation of twiddle factors specific for each N;

I_{k}

is an identity k × k matrix;

H_{2}

is the order 2 Hadamard matrix;

I_{2^{i + 1}}^{(1 \to)}

is the matrix obtained from the k × k identity matrix by shifting its columns by one position to the right; and signs “⊗”, “⊕” denote the tensor product and direct sum of two matrices, respectively [16,17].

Then, the generalized scheme for the synthesis of Winograd-type DFT algorithms for N equal to the power of two can be described as follows:

Y_{2^{i + 1} \times 1} = (H_{2} \otimes I_{2^{i}}) (E_{2^{i}} \oplus Q_{2^{i}}) P_{2^{i + 1}}^{(π_{2^{i + 1}})} X_{2^{i + 1} \times 1}

(4)

The methods for factorizing the matrices

E_{k}

and

Q_{k}

are different, but both lead to a factorization of the BCD type [18] similar to the Winograd factorization. Moreover, as follows from expression (1), the expansions for small N are part of the expansions for larger lengths of input sequences. When synthesizing algorithms for separate

E_{k}

and

Q_{k}

, we will use the templates of matrix structures and identities presented in [19,20].

4. Synthesis of the Fast Winograd-Type DFT Algorithms

Let us show, based on specific examples, how it works.

4.1. Fast DFT Algorithm for N = 4

As an example, suppose that N = 4. Then (2) can be rewritten as

Y_{4 \times 1} = E_{4} X_{4 \times 1}

(5)

where

E_{4} = [\begin{matrix} a_{4} & a_{4} & a_{4} & a_{4} \\ a_{4} & b_{4} & - a_{4} & - b_{4} \\ a_{4} & - a_{4} & a_{4} & - a_{4} \\ a_{4} & - b_{4} & - a_{4} & b_{4} \end{matrix}],

a_{4} = 1, b_{4} = - j .

X_{4 \times 1} = {[x_{0}, x_{1}, x_{2}, x_{3}]}^{T}, Y_{4 \times 1} = {[y_{0}, y_{1}, y_{2}, y_{3}]}^{T} .

Let us now define the permutation

π_{4}

and write it as a matrix in this way:

π_{4} = (\begin{matrix} 1 & 2 & 3 & 4 \\ 1 & 3 & 2 & 4 \end{matrix}), P_{4}^{(π_{4})} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

Permute the columns of the matrix

E_{4}

according to permutation

π_{4}

. As a result of such a permutation, we obtain the matrix

{\tilde{E}}_{4} = [\begin{matrix} E_{2} & Q_{2} \\ E_{2} & - Q_{2} \end{matrix}] = {\tilde{E}}_{4} P_{4}^{(π_{4})}

where

E_{2} = [\begin{matrix} a_{4} & a_{4} \\ a_{4} & - a_{4} \end{matrix}] and Q_{2} = [\begin{matrix} a_{4} & a_{4} \\ b_{4} & - b_{4} \end{matrix}]

According to the concept, Expression (4) for a given transform size can be rewritten as follows:

Y_{4 \times 1} = (H_{2} \otimes I_{2}) (E_{2} \oplus Q_{2}) P_{4}^{(π_{4})} X_{4 \times 1}

(6)

where

H_{2} \otimes I_{2} = [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] \otimes [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}] = [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & - 1 \\ 1 & - 1 \end{matrix}] = W_{4}^{(0)} .

As you can see, after rearranging the columns of the DFT matrix

E_{4}

, it can be decomposed, as follows from the proposed technique, into the order 2 DFT matrix

E_{2}

and the order 2 prefix matrix

Q_{2}

.

For matrices

E_{2}

and

Q_{2}

, we can offer the following factorization schemes leading to a reduction in computational complexity:

E_{2} = [\begin{matrix} a_{4} & a_{4} \\ a_{4} & - a_{4} \end{matrix}] = (a_{4} \oplus a_{4}) H_{2}, Q_{2} = [\begin{matrix} a_{4} & a_{4} \\ b_{4} & - b_{4} \end{matrix}] = (a_{4} \oplus b_{4}) H_{2}

Taking into account the above factorization schemes, we can finally write

Y_{4 \times 1} = W_{4}^{(0)} D_{4} W_{4}^{(1)} P_{4}^{(π_{4})} X_{4 \times 1}

(7)

where

W_{4}^{(1)} = [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] \oplus [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] = [\begin{matrix} 1 & 1 \\ 1 & - 1 \\ 1 & 1 \\ 1 & - 1 \end{matrix}],

D_{4} = d i a g (φ_{0}, φ_{1}, φ_{2}, φ_{3}),

φ_{0}, φ_{1}, φ_{2} = a_{4} = 1, φ_{3} = b_{4} = - j .

Figure 1 shows a data flow graph of synthesized algorithm for 4 point DFT. As can be seen, in this case, the algorithm takes only eight additions.

4.2. Fast DFT Algorithm for N = 8

As an example, suppose that N = 8. Then (2) can be rewritten as

Y_{8 \times 1} = E_{8} X_{8 \times 1}

(8)

where

X_{8 \times 1} = {[x_{0}, x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}, x_{7}]}^{T},

Y_{8 \times 1} = {[y_{0}, y_{1}, y_{2}, y_{3}, y_{4}, y_{5}, y_{6}, y_{7}]}^{T},

a_{8} = 1, b_{8} = 0.7071 - j 0.7071, c_{8} = - j, d_{8} = - 0.7071 - j 0.7071,

E_{8} = [\begin{matrix} a_{8} & a_{8} & a_{8} & a_{8} & a_{8} & a_{8} & a_{8} & a_{8} \\ a_{8} & b_{8} & c_{8} & d_{8} & - a_{8} & - b_{8} & - c_{8} & - d_{8} \\ a_{8} & c_{8} & - a_{8} & - c_{8} & a_{8} & c_{8} & - a_{8} & - c_{8} \\ a_{8} & d_{8} & - c_{8} & b_{8} & - a_{8} & - d_{8} & c_{8} & - b_{8} \\ a_{8} & - a_{8} & a_{8} & - a_{8} & a_{8} & - a_{8} & a_{8} & - a_{8} \\ a_{8} & - b_{8} & c_{8} & - d_{8} & - a_{8} & b_{8} & - c_{8} & d_{8} \\ a_{8} & - c_{8} & - a_{8} & c_{8} & a_{8} & - c_{8} & - a_{8} & c_{8} \\ a_{8} & - d_{8} & - c_{8} & - b_{8} & - a_{8} & d_{8} & c_{8} & b_{8} \end{matrix}] .

Let us now define the permutation

π_{8}

in the following form:

π_{8} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 1 & 3 & 5 & 7 & 2 & 4 & 6 & 8 \end{matrix}) .

Permutation

π_{8}

can be written as a matrix in this way:

P_{8}^{(π_{8})} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

Permute columns of the matrix

E_{8}

according to permutation

π_{8}

. As a result of such a permutation, we obtain the matrix

{\tilde{E}}_{8} = [\begin{matrix} E_{4} & Q_{4} \\ E_{4} & - Q_{4} \end{matrix}] = E_{8} P_{8}^{(π_{8})}

where

E_{4} = [\begin{matrix} a_{8} & a_{8} & a_{8} & a_{8} \\ a_{8} & c_{8} & - a_{8} & - c_{8} \\ a_{8} & - a_{8} & a_{8} & - a_{8} \\ a_{8} & - c_{8} & - a_{8} & c_{8} \end{matrix}] and Q_{4} = [\begin{matrix} a_{8} & a_{8} & a_{8} & a_{8} \\ b_{8} & d_{8} & - b_{8} & - d_{8} \\ c_{8} & - c_{8} & c_{8} & - c_{8} \\ d_{8} & b_{8} & - d_{8} & - b_{8} \end{matrix}]

According to the concept, Expression (4) for a given transform size can be rewritten as follows:

Y_{8 \times 1} = (H_{2} \otimes I_{4}) (E_{4} \oplus Q_{4}) P_{8}^{(π_{8})} X_{8 \times 1}

(9)

Such a structure of the matrix

{\tilde{E}}_{8}

allows to apply a divide and conquer algorithm that recursively breaks down a matrix–vector product of order eight into two smaller matrix–vector products of order 4 [19]. If we write the matrix

E_{8}

as a product

{\tilde{E}}_{8} P_{8}^{(π_{8})}

, the Equation (8) will take the form:

Y_{8 \times 1} = W_{8}^{(0)} (E_{4} \oplus Q_{4}) P_{8}^{(π_{8})} X_{8 \times 1}

(10)

where

W_{8}^{(0)} = H_{2} \otimes I_{4} .

Permute columns of the matrix

E_{4}

and rows of the matrix

Q_{4}

according to permutation

π_{4}

. As a result of such permutations, we obtain the matrices

{\tilde{E}}_{4} = [\begin{matrix} A_{2} & B_{2} \\ A_{2} & - B_{2} \end{matrix}] and {\tilde{Q}}_{4} = [\begin{matrix} C_{2} & C_{2} \\ D_{2} & - D_{2} \end{matrix}]

where

A_{2} = [\begin{matrix} a_{8} & a_{8} \\ a_{8} & - a_{8} \end{matrix}], B_{2} = [\begin{matrix} a_{8} & a_{8} \\ c_{8} & - c_{8} \end{matrix}],

C_{2} = [\begin{matrix} a_{8} & a_{8} \\ c_{8} & - c_{8} \end{matrix}], D_{2} = [\begin{matrix} b_{8} & d_{8} \\ d_{8} & b_{8} \end{matrix}] .

Such structures of the matrices

{\tilde{E}}_{4}

and

{\tilde{Q}}_{4}

allow to apply the same schemes of factorization. Therefore, we can write

Y_{8 \times 1} = W_{8}^{(0)} {\tilde{W}}_{8}^{(0)} (A_{2} \oplus B_{2} \oplus C_{2} \oplus D_{2}) {\tilde{W}}_{8}^{(1)} P_{8}^{(π_{8})} X_{8 \times 1}

(11)

where

{\tilde{W}}_{8}^{(0)} = (H_{2} \otimes I_{2}) \oplus P_{4}^{(π_{4})}, {\tilde{W}}_{8}^{(1)} = P_{4}^{(π_{4})} \oplus (H_{2} \otimes I_{2}),

For matrices

A_{2}

,

B_{2}

,

C_{2}

,

D_{2}

we can offer the following factorization schemes leading to a reduction in computational complexity:

A_{2} = [\begin{matrix} a_{8} & a_{8} \\ a_{8} & - a_{8} \end{matrix}] = (a_{8} \oplus a_{8}) H_{2}, B_{2} = [\begin{matrix} a_{8} & a_{8} \\ c_{8} & - c_{8} \end{matrix}] = (a_{8} \oplus c_{8}) H_{2},

C_{2} = [\begin{matrix} a_{8} & a_{8} \\ c_{8} & - c_{8} \end{matrix}] = (a_{8} \oplus c_{8}) H_{2}, D_{2} = [\begin{matrix} b_{8} & d_{8} \\ d_{8} & b_{8} \end{matrix}] = H_{2} \frac{1}{2} [(b_{8} + d_{8}) \oplus (b_{8} - d_{8})] H_{2}

Taking into account the above factorization schemes, we can finally write

Y_{8 \times 1} = W_{8}^{(0)} {\tilde{W}}_{8}^{(0)} W_{8}^{(3)} D_{8} W_{8}^{(4)} {\tilde{W}}_{8}^{(1)} P_{8}^{(π_{8})} X_{8 \times 1}

(12)

where

W_{8}^{(4)} = I_{4} \otimes H_{2}, W_{8}^{(3)} = I_{6} \oplus H_{2},

D_{8} = d i a g (φ_{0}, φ_{1}, φ_{2}, φ_{3}, φ_{4}, φ_{5}, φ_{6}, φ_{7}),

φ_{0}, φ_{1}, φ_{2}, φ_{4} = a_{8} = 1, φ_{3}, φ_{5} = e_{8} = - j, φ_{6} = - j 0.7071, φ_{7} = 0.7071 .

Expression (12) describes the Winograd-type fast Fourier transform algorithm for N = 8. Figure 2 shows a data flow graph of synthesized algorithm for 8 point DFT. As can be seen, in this case the algorithm takes 2 multiplications and 26 additions.

4.3. Fast DFT Algorithm for N = 16

Now let us consider the synthesis of a similar algorithm for N = 16. In matrix–vector notation, we can rewrite the DFT in the following form:

Y_{16 \times 1} = E_{16} X_{16 \times 1}

(13)

where

X_{16 \times 1} = {[x_{0}, x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}, x_{7}, x_{8}, x_{9}, x_{10}, x_{11}, x_{12}, x_{13}, x_{14}, x_{15}]}^{T}

Y_{16 \times 1} = {[y_{0}, y_{1}, y_{2}, y_{3}, y_{4}, y_{5}, y_{6}, y_{7}, y_{8}, y_{9}, y_{10}, y_{11}, y_{12}, y_{13}, y_{14}, y_{15}]}^{T}

E_{16} = [\begin{matrix} E_{8}^{(0, 0)} & E_{8}^{(0, 1)} \\ E_{8}^{(1, 0)} & - E_{8}^{(1, 1)} \end{matrix}] .

where

E_{8}^{(0, 0)} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} \\ a_{16} & b_{16} & c_{16} & d_{16} & e_{16} & f_{16} & g_{16} & h_{16} \\ a_{16} & c_{16} & e_{16} & g_{16} & - a_{16} & - c_{16} & - e_{16} & - g_{16} \\ a_{16} & d_{16} & g_{16} & - b_{16} & - e_{16} & - h_{16} & c_{16} & f_{16} \\ a_{16} & e_{16} & - a_{16} & - e_{16} & a_{16} & e_{16} & - a_{16} & - e_{16} \\ a_{16} & f_{16} & - c_{16} & - h_{16} & e_{16} & - b_{16} & - g_{16} & d_{16} \\ a_{16} & g_{16} & - e_{16} & c_{16} & - a_{16} & - g_{16} & e_{16} & - c_{16} \\ a_{16} & h_{16} & - g_{16} & f_{16} & - e_{16} & d_{16} & - c_{16} & b_{16} \end{matrix}],

E_{8}^{(0, 1)} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} \\ - a_{16} & - b_{16} & - c_{16} & - d_{16} & - e_{16} & - f_{16} & - g_{16} & - h_{16} \\ a_{16} & c_{16} & e_{16} & g_{16} & - a_{16} & - c_{16} & - e_{16} & - g_{16} \\ - a_{16} & - d_{16} & - g_{16} & b_{16} & e_{16} & h_{16} & - c_{16} & - f_{16} \\ a_{16} & e_{16} & - a_{16} & - e_{16} & a_{16} & e_{16} & - a_{16} & - e_{16} \\ - a_{16} & - f_{16} & c_{16} & h_{16} & - e_{16} & b_{16} & g_{16} & - d_{16} \\ a_{16} & g_{16} & - e_{16} & c_{16} & - a_{16} & - g_{16} & e_{16} & - c_{16} \\ - a_{16} & - h_{16} & g_{16} & - f_{16} & e_{16} & - d_{16} & c_{16} & - b_{16} \end{matrix}],

E_{8}^{(1, 0)} = [\begin{matrix} a_{16} & - a_{16} & a_{16} & - a_{16} & a_{16} & - a_{16} & a_{16} & - a_{16} \\ a_{16} & - b_{16} & c_{16} & - d_{16} & e_{16} & - f_{16} & g_{16} & - h_{16} \\ a_{16} & - c_{16} & e_{16} & - g_{16} & - a_{16} & c_{16} & - e_{16} & g_{16} \\ a_{16} & - d_{16} & g_{16} & b_{16} & - e_{16} & h_{16} & c_{16} & - f_{16} \\ a_{16} & - e_{16} & - a_{16} & e_{16} & a_{16} & - e_{16} & - a_{16} & e_{16} \\ a_{16} & - f_{16} & - c_{16} & h_{16} & e_{16} & b_{16} & - g_{16} & - d_{16} \\ a_{16} & - g_{16} & - e_{16} & - c_{16} & - a_{16} & g_{16} & e_{16} & c_{16} \\ a_{16} & - h_{16} & - g_{16} & - f_{16} & - e_{16} & - d_{16} & - c_{16} & - b_{16} \end{matrix}],

E_{8}^{(1, 1)} = [\begin{matrix} a_{16} & - a_{16} & a_{16} & - a_{16} & a_{16} & - a_{16} & a_{16} & - a_{16} \\ - a_{16} & b_{16} & - c_{16} & d_{16} & - e_{16} & f_{16} & - g_{16} & h_{16} \\ a_{16} & - c_{16} & e_{16} & - g_{16} & - a_{16} & c_{16} & - e_{16} & g_{16} \\ - a_{16} & d_{16} & - g_{16} & - b_{16} & e_{16} & - h_{16} & - c_{16} & f_{16} \\ a_{16} & - e_{16} & - a_{16} & e_{16} & a_{16} & - e_{16} & - a_{16} & e_{16} \\ - a_{16} & f_{16} & c_{16} & - h_{16} & - e_{16} & - b_{16} & g_{16} & d_{16} \\ a_{16} & - g_{16} & - e_{16} & - c_{16} & - a_{16} & g_{16} & e_{16} & c_{16} \\ - a_{16} & h_{16} & g_{16} & f_{16} & e_{16} & d_{16} & c_{16} & b_{16} \end{matrix}] .

where

a_{16} = 1, b_{16} = 0.9239 - j 0.3827, c_{16} = 0.7071 - j 0.7071,

d_{16} = 0.3827 - j 0.9239, e_{16} = - j, f_{16} = - 0.3827 - j 0.9239,

g_{16} = - 0.7071 - j 0.7071, h_{16} = - 0.9239 - j 0.3827 .

Let us define the permutation

π_{16}

in the following form:

π_{16} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 & 16 \\ 1 & 3 & 5 & 7 & 9 & 11 & 13 & 15 & 2 & 4 & 6 & 8 & 10 & 12 & 14 & 16 \end{matrix})

Permute columns of the matrix

E_{16}

according to permutation

π_{16}

. As a result of such permutation, we obtain the matrix

{\tilde{E}}_{16} = [\begin{matrix} E_{8} & Q_{8} \\ E_{8} & - Q_{8} \end{matrix}] = {\tilde{E}}_{16} P_{16}^{(π_{16})}

where

E_{8} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} \\ a_{16} & c_{16} & e_{16} & g_{16} & - a_{16} & - c_{16} & - e_{16} & - g_{16} \\ a_{16} & e_{16} & - a_{16} & - e_{16} & a_{16} & a_{16} & - a_{16} & - e_{16} \\ a_{16} & g_{16} & - e_{16} & c_{16} & - a_{16} & - g_{16} & e_{16} & - c_{16} \\ a_{16} & - a_{16} & a_{16} & - a_{16} & a_{16} & - a_{16} & a_{16} & - a_{16} \\ a_{16} & - c_{16} & e_{16} & - g_{16} & - a_{16} & c_{16} & - e_{16} & g_{16} \\ a_{16} & - e_{16} & - a_{16} & e_{16} & a_{16} & - e_{16} & - a_{16} & e_{16} \\ a_{16} & - g_{16} & - e_{16} & - c_{16} & - a_{16} & g_{16} & e_{16} & c_{16} \end{matrix}],

Q_{8} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} \\ b_{16} & d_{16} & f_{16} & h_{16} & - b_{16} & - d_{16} & - f_{16} & - h_{16} \\ c_{16} & g_{16} & - c_{16} & - g_{16} & c_{16} & g_{16} & - c_{16} & - g_{16} \\ d_{16} & - b_{16} & - h_{16} & f_{16} & - d_{16} & b_{16} & h_{16} & - f_{16} \\ e_{16} & - e_{16} & e_{16} & - e_{16} & e_{16} & - e_{16} & e_{16} & - e_{16} \\ f_{16} & - h_{16} & - b_{16} & d_{16} & - f_{16} & h_{16} & b_{16} & - d_{16} \\ g_{16} & c_{16} & - g_{16} & - c_{16} & g_{16} & c_{16} & - g_{16} & - c_{16} \\ h_{16} & f_{16} & d_{16} & b_{16} & - h_{16} & - f_{16} & - d_{16} & - b_{16} \end{matrix}] .

According to the concept, expression (4) for a given transform size can be rewritten as follows:

Y_{16 \times 1} = (H_{2} \otimes I_{8}) (E_{8} \oplus Q_{8}) P_{16}^{(π_{16})} X_{16 \times 1}

(14)

Such a matrix structure allows for the factorization of the matrix

{\tilde{E}}_{16}

; similarly, as was done in the case of the matrix of order N = 8 [19].

If we write the matrix

E_{16}

as a product

{\tilde{E}}_{16} P_{16}^{(π_{16})}

, Equation (13) takes the form

Y_{16 \times 1} = W_{16}^{(0)} (E_{8} \oplus Q_{8}) P_{16}^{(π_{16})} X_{16 \times 1}

(15)

where the corresponding permutation matrix

P_{16}^{(π_{16})}

takes the following form:

{\overset{ˇ}{P}}_{4 \times 8} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}], {\hat{P}}_{4 \times 8} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}],

{\overset{ˇ}{P}}_{8 \times 16} = {\overset{ˇ}{P}}_{4 \times 8} \oplus {\overset{ˇ}{P}}_{4 \times 8}, {\hat{P}}_{8 \times 16} = {\hat{P}}_{4 \times 8} \oplus {\hat{P}}_{4 \times 8},

P_{16}^{(π_{16})} = P_{16}^{(0)} = [\begin{matrix} {\overset{ˇ}{P}}_{8 \times 16} \\ {\hat{P}}_{8 \times 16} \end{matrix}] and W_{16}^{(0)} = H_{2} \otimes I_{8} .

Now let us permute columns of the matrix

E_{8}

according to permutation

π_{8}

. As a result of such a permutation, we obtain the matrix

{\tilde{E}}_{8} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} \\ a_{16} & e_{16} & - a_{16} & - e_{16} & e_{16} & g_{16} & - c_{16} & - g_{16} \\ a_{16} & - a_{16} & a_{16} & - a_{16} & e_{16} & - e_{16} & e_{16} & - e_{16} \\ a_{16} & - e_{16} & - a_{16} & e_{16} & g_{16} & c_{16} & - g_{16} & - c_{16} \\ a_{16} & a_{16} & a_{16} & a_{16} & - a_{16} & - a_{16} & - a_{16} & - a_{16} \\ a_{16} & e_{16} & - a_{16} & - e_{16} & - c_{16} & - g_{16} & e_{16} & g_{16} \\ a_{16} & - a_{16} & a_{16} & - a_{16} & - e_{16} & e_{16} & - e_{16} & e_{16} \\ a_{16} & - e_{16} & - a_{16} & e_{16} & - g_{16} & - c_{16} & g_{16} & c_{16} \end{matrix}] = [\begin{matrix} A_{4} & B_{4} \\ A_{4} & - B_{4} \end{matrix}],

where

A_{4} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} \\ a_{16} & e_{16} & - a_{16} & - e_{16} \\ a_{16} & - a_{16} & a_{16} & - a_{16} \\ a_{16} & - e_{16} & - a_{16} & e_{16} \end{matrix}] and B_{4} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} \\ c_{16} & g_{16} & - c_{16} & - g_{16} \\ e_{16} & - e_{16} & e_{16} & - e_{16} \\ g_{16} & c_{16} & - g_{16} & - c_{16} \end{matrix}] .

Then the matrix

E_{8}

can be represented as a product

{\tilde{E}}_{8} P_{8}^{(π_{8})} = W_{8}^{(0)} (A_{4} \oplus B_{4}) P_{8}^{(π_{8})}

. Next, we permute rows of the matrix

Q_{8}

according to permutation

π_{8}

. As a result of such a permutation, we obtain the matrix

{\tilde{Q}}_{8}

{\tilde{Q}}_{8} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} & a_{16} \\ c_{16} & g_{16} & - c_{16} & - g_{16} & c_{16} & g_{16} & - c_{16} & - g_{16} \\ e_{16} & - e_{16} & e_{16} & - e_{16} & e_{16} & - e_{16} & e_{16} & - e_{16} \\ g_{16} & c_{16} & - g_{16} & - c_{16} & g_{16} & c_{16} & - g_{16} & - c_{16} \\ b_{16} & d_{16} & f_{16} & h_{16} & - b_{16} & - d_{16} & - f_{16} & - h_{16} \\ d_{16} & - b_{16} & - h_{16} & f_{16} & - d_{16} & b_{16} & h_{16} & - f_{16} \\ f_{16} & - h_{16} & - b_{16} & d_{16} & - f_{16} & h_{16} & b_{16} & - d_{16} \\ h_{16} & f_{16} & d_{16} & b_{16} & - h_{16} & - f_{16} & - d_{16} & - b_{16} \end{matrix}] = [\begin{matrix} C_{4} & C_{4} \\ D_{4} & - D_{4} \end{matrix}],

where

C_{4} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} \\ c_{16} & g_{16} & - c_{16} & - g_{16} \\ e_{16} & - e_{16} & e_{16} & - e_{16} \\ g_{16} & c_{16} & - g_{16} & - c_{16} \end{matrix}] and D_{4} = [\begin{matrix} b_{16} & d_{16} & f_{16} & h_{16} \\ d_{16} & - b_{16} & - h_{16} & f_{16} \\ f_{16} & - h_{16} & - b_{16} & d_{16} \\ h_{16} & f_{16} & d_{16} & b_{16} \end{matrix}] .

Then the matrix

Q_{8}

can be represented as a product

P_{8}^{(π_{8})} {\tilde{Q}}_{8} = P_{8}^{(π_{8})} (C_{4} \oplus D_{4}) W_{8}^{(0)}

. Taking into account the above factorization schemes, we can finally write

Y_{16 \times 1} = W_{16}^{(0)} {\tilde{W}}_{16}^{(0)} (A_{4} \oplus B_{4} \oplus C_{4} \oplus D_{4}) {\tilde{W}}_{16}^{(1)} P_{16}^{(0)} X_{16 \times 1}

(16)

where

{\tilde{W}}_{16}^{(0)} = W_{8}^{(0)} \oplus {[P_{8}^{(π_{8})}]}^{T}, {\tilde{W}}_{16}^{(1)} = P_{8}^{(π_{8})} \oplus W_{8}^{(0)} .

We now consider the matrices

A_{4}

,

B_{4}

,

C_{4}

, and

D_{4}

. Permute columns of the matrix

A_{4}

according to permutation

π_{4}

. As a result of such a permutation, we obtain the matrix

{\tilde{A}}_{4} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} \\ a_{16} & - a_{16} & e_{16} & - e_{16} \\ a_{16} & a_{16} & - a_{16} & - a_{16} \\ a_{16} & - a_{16} & - e_{16} & e_{16} \end{matrix}] = [\begin{matrix} A_{2} & B_{2} \\ A_{2} & - B_{2} \end{matrix}] .

Next, we permute rows of the matrix

B_{4}

according to permutation

π_{4}

. As a result of such a permutation, we obtain the matrix

{\tilde{B}}_{4}

{\tilde{B}}_{4} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} \\ e_{16} & - e_{16} & e_{16} & - e_{16} \\ c_{16} & g_{16} & - c_{16} & - g_{16} \\ g_{16} & c_{16} & - g_{16} & - c_{16} \end{matrix}] = [\begin{matrix} C_{2} & C_{2} \\ D_{2} & - D_{2} \end{matrix}] .

Now, we permute rows of the matrix

C_{4}

according to permutation

π_{4}

. As a result of such a permutation, we obtain the matrix

{\tilde{C}}_{4}

{\tilde{C}}_{4} = [\begin{matrix} a_{16} & a_{16} & a_{16} & a_{16} \\ e_{16} & - e_{16} & e_{16} & - e_{16} \\ c_{16} & g_{16} & - c_{16} & - g_{16} \\ g_{16} & c_{16} & - g_{16} & - c_{16} \end{matrix}] = [\begin{matrix} F_{2} & F_{2} \\ G_{2} & - G_{2} \end{matrix}] .

Next, we define the permutation in the following way:

{\tilde{π}}_{4} = (\begin{matrix} 1 & 2 & 3 & 4 \\ 1 & 2 & 4 & 3 \end{matrix})

Permute rows and columns of the matrix

D_{4}

according to permutation

{\tilde{π}}_{4}

. As a result of such a permutation, we obtain the matrix

{\tilde{D}}_{4}

.

{\tilde{D}}_{4} = [\begin{matrix} b_{16} & d_{16} & h_{16} & f_{16} \\ d_{16} & - b_{16} & f_{16} & - h_{16} \\ h_{16} & f_{16} & b_{16} & d_{16} \\ f_{16} & - h_{16} & d_{16} & - b_{16} \end{matrix}] = P_{4}^{({\tilde{π}}_{4})} D_{4} P_{4}^{({\tilde{π}}_{4})} = [\begin{matrix} {\tilde{J}}_{2} & {\tilde{K}}_{2} \\ {\tilde{K}}_{2} & - {\tilde{J}}_{2} \end{matrix}]

where

P_{4}^{({\tilde{π}}_{4})} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

Then

D_{4} = P_{4}^{({\tilde{π}}_{4})} (H_{2} \otimes I_{2}) \frac{1}{2} [({\tilde{J}}_{2} + {\tilde{K}}_{2}) \oplus ({\tilde{J}}_{2} - {\tilde{K}}_{2})] (H_{2} \otimes I_{2}) P_{4}^{({\tilde{π}}_{4})}

where

\frac{1}{2} ({\tilde{J}}_{2} + {\tilde{K}}_{2}) = \frac{1}{2} [\begin{matrix} b_{16} + h_{16} & d_{16} + f_{16} \\ d_{16} + f_{16} & - (b_{16} + h_{16}) \end{matrix}] = J_{2},

\frac{1}{2} ({\tilde{J}}_{2} - {\tilde{K}}_{2}) = \frac{1}{2} [\begin{matrix} b_{16} - h_{16} & d_{16} - f_{16} \\ d_{16} - f_{16} & - (b_{16} - h_{16}) \end{matrix}] = K_{2} .

Taking into account the matrix transformations performed above, we can write

Y_{16 \times 1} = W_{16}^{(0)} {\tilde{W}}_{16}^{(0)} W_{16}^{(4)} P_{16}^{(4)} D_{16} P_{16}^{(3)} W_{16}^{(3)} {\tilde{W}}_{16}^{(1)} P_{16}^{(0)} X_{16 \times 1}

(17)

where

D_{16} = A_{2} \oplus B_{2} \oplus C_{2} \oplus D_{2} \oplus F_{2} \oplus G_{2} \oplus J_{2} \oplus K_{2},

P_{16}^{(4)} = I_{4} \oplus P_{4}^{(π_{4})} \oplus P_{4}^{(π_{4})} \oplus P_{4}^{({\tilde{π}}_{4})}, W_{16}^{(4)} = W_{4}^{(0)} \oplus I_{8} \oplus W_{4}^{(0)},

W_{16}^{(3)} = I_{4} \oplus W_{4}^{(0)} \oplus W_{4}^{(0)} \oplus W_{4}^{(0)}, P_{16}^{(3)} = P_{4}^{(π_{4})} \oplus I_{8} \oplus P_{4}^{({\tilde{π}}_{4})},

A_{2} = [\begin{matrix} a_{16} & a_{16} \\ a_{16} & - a_{16} \end{matrix}], B_{2} = [\begin{matrix} a_{16} & a_{16} \\ e_{16} & - e_{16} \end{matrix}],

C_{2} = F_{2} = [\begin{matrix} a_{16} & a_{16} \\ e_{16} & - e_{16} \end{matrix}], D_{2} = G_{2} = [\begin{matrix} c_{16} & g_{16} \\ g_{16} & c_{16} \end{matrix}],

J_{2} = \frac{1}{2} [\begin{matrix} b_{16} + h_{16} & d_{16} + f_{16} \\ d_{16} + f_{16} & - (b_{16} + h_{16}) \end{matrix}], K_{2} = \frac{1}{2} [\begin{matrix} b_{16} - h_{16} & d_{16} - f_{16} \\ d_{16} - f_{16} & - (b_{16} - h_{16}) \end{matrix}] .

In turn, the matrices

A_{2}

,

B_{2}

,

C_{2}

,

D_{2}

,

F_{2}

,

G_{2}

,

J_{2}

and

K_{2}

also have structures that provide effective factorization, which leads to a decrease in the multiplicative complexity of calculations:

A_{2} = [\begin{matrix} a_{16} & a_{16} \\ a_{16} & - a_{16} \end{matrix}] = (a_{16} \oplus a_{16}) H_{2},

B_{2} = [\begin{matrix} a_{16} & a_{16} \\ e_{16} & - e_{16} \end{matrix}] = (a_{16} \oplus e_{16}) H_{2},

C_{2} = F_{2} [\begin{matrix} a_{16} & a_{16} \\ e_{16} & - e_{16} \end{matrix}] = (a_{16} \oplus e_{16}) H_{2},

D_{2} = G_{2} [\begin{matrix} c_{16} & g_{16} \\ g_{16} & c_{16} \end{matrix}] = H_{2} \frac{1}{2} [(c_{16} + g_{16}) \oplus (c_{16} - g_{16})] H_{2},

J_{2} = \frac{1}{2} [\begin{matrix} b_{16} + h_{16} & d_{16} + f_{16} \\ d_{16} + f_{16} & - (b_{16} + h_{16}) \end{matrix}] = T_{2 \times 3} \frac{1}{2} [\begin{matrix} j_{21} & 0 & 0 \\ 0 & j_{22} & 0 \\ 0 & 0 & j_{23} \end{matrix}] T_{3 \times 2},

where

T_{2 \times 3} = [\begin{matrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{matrix}], T_{3 \times 2} = [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{matrix}],

j_{21} = (b_{16} + h_{16}) - (d_{16} + f_{16}),

j_{22} = - [(b_{16} + h_{16}) + (d_{16} + f_{16})],

j_{23} = d_{16} + f_{16} .

K_{2} = \frac{1}{2} [\begin{matrix} b_{16} - h_{16} & d_{16} - f_{16} \\ d_{16} - f_{16} & - (b_{16} - h_{16}) \end{matrix}] = T_{2 \times 3} \frac{1}{2} [\begin{matrix} k_{21} & 0 & 0 \\ 0 & k_{22} & 0 \\ 0 & 0 & k_{23} \end{matrix}] T_{3 \times 2} .

where

k_{21} = (b_{16} - h_{16}) - (d_{16} - f_{16}),

k_{22} = - [(b_{16} - h_{16}) + (d_{16} - f_{16})],

k_{23} = (d_{16} - f_{16}) .

Combining the above partial decompositions in a single procedure, we can rewrite (13) as follows:

Y_{16 \times 1} = W_{16}^{(0)} {\tilde{W}}_{16}^{(0)} W_{16}^{(4)} P_{16}^{(4)} W_{16}^{(6)} A_{16 \times 18} D_{18} A_{18 \times 16} W_{16}^{(5)} P_{16}^{(3)} W_{16}^{(3)} {\tilde{W}}_{16}^{(1)} P_{16}^{(0)} X_{16 \times 1}

(18)

where

W_{16}^{(5)} = (I_{6} \otimes H_{2}) \oplus I_{4}, W_{16}^{(6)} = I_{6} \oplus H_{2} \oplus I_{2} \oplus H_{2} \oplus I_{4},

D_{18} = d i a g (φ_{0}, φ_{1}, \dots, φ_{17}),

φ_{0}, φ_{1}, φ_{2}, φ_{4}, φ_{8} = a_{16} = 1, φ_{3}, φ_{5}, φ_{9} = e_{16} = - j,

φ_{6}, φ_{10} = \frac{1}{2} (c_{16} + g_{16}) = - j 0.7071, φ_{7}, φ_{11} = \frac{1}{2} (c_{16} - g_{16}) = 0.7071,

φ_{12} = \frac{1}{2} [(b_{16} + h_{16}) - (d_{16} + f_{16})] = j 0.5412,

φ_{13} = - \frac{1}{2} [(b_{16} + h_{16}) - (d_{16} + f_{16})] = j 1.3066,

φ_{14} = \frac{1}{2} (d_{16} + f_{16}) = - j 0.9239,

φ_{15} = \frac{1}{2} [(b_{16} - h_{16}) - (d_{16} - f_{16})] = - 0.5412,

φ_{16} = - \frac{1}{2} [(b_{16} - h_{16}) + (d_{16} - f_{16})] = 1.3066,

φ_{17} = \frac{1}{2} (d_{16} - f_{16}) = - 0.3827,

A_{16 \times 18} = I_{12} \oplus T_{2 \times 3} \oplus T_{2 \times 3}, A_{18 \times 16} = I_{12} \oplus T_{3 \times 2} \oplus T_{3 \times 2}

Figure 3 shows a data flow graph of synthesized algorithm for 16 point DFT. As can be seen, in this case the algorithm takes 10 multiplications and 74 additions.

4.4. Fast DFT Algorithm for N = 32

Now let us consider the synthesis of a similar algorithm for N = 32. In matrix–vector notation, we can rewrite the DFT in the following form:

Y_{32 \times 1} = E_{32} X_{32 \times 1}

(19)

where

X_{32 \times 1} = {[x_{0}, x_{1}, x_{2}, \dots, x_{29}, x_{30}, x_{31}]}^{T}, Y_{32 \times 1} = {[y_{0}, y_{1}, y_{2}, \dots, y_{29}, y_{30}, y_{31}]}^{T}

Let us define the permutation

π_{32}

in the following form:

π_{32} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 & 16 \\ 1 & 3 & 5 & 7 & 9 & 11 & 13 & 15 & 17 & 19 & 21 & 23 & 25 & 27 & 29 & 31 \\ 17 & 18 & 19 & 20 & 21 & 22 & 23 & 24 & 25 & 26 & 27 & 28 & 29 & 30 & 31 & 32 \\ 2 & 4 & 6 & 8 & 10 & 12 & 14 & 16 & 18 & 20 & 22 & 24 & 26 & 28 & 30 & 32 \end{matrix})

Permute columns of the matrix

E_{32}

according to permutation

π_{32}

. As a result of such permutation, we obtain the matrix

{\tilde{E}}_{32} = [\begin{matrix} E_{16} & Q_{16} \\ E_{16} & - Q_{16} \end{matrix}] = E_{32} P_{32}^{(π_{32})}

According to the concept, expression (4) for a given transform size can be rewritten as follows:

Y_{32 \times 1} = W_{32}^{(0)} (E_{16} \oplus Q_{16}) P_{32}^{(π_{32})} X_{32 \times 1}

(20)

where

W_{32}^{(0)} = (H_{2} \otimes I_{16}),

P_{32}^{(π_{32})} = [\begin{matrix} {\overset{ˇ}{P}}_{16 \times 32} \\ {\hat{P}}_{16 \times 32} \end{matrix}], {\overset{ˇ}{P}}_{16 \times 32} = \overset{3}{\underset{i = 0}{\oplus}} {\overset{ˇ}{P}}_{4 \times 8}^{(i)}, {\hat{P}}_{16 \times 32} = \overset{3}{\underset{i = 0}{\oplus}} {\hat{P}}_{4 \times 8}^{(i)},

{\overset{ˇ}{P}}_{4 \times 8}^{(i)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}], {\hat{P}}_{4 \times 8}^{(i)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}],

E_{16}

is the same as in the algorithm for N = 16, so we will skip this part.

Q_{16}

is a new matrix. Permute rows of the matrix

Q_{16}

according to permutation

π_{16}

. As a result of such a permutation, we obtain the matrix

{\tilde{Q}}_{16} = [\begin{matrix} A_{8} & A_{8} \\ B_{8} & - B_{8} \end{matrix}] = Q_{16} P_{16}^{(π_{16})}

where

A_{8} = [\begin{matrix} a_{32} & a_{32} & a_{32} & a_{32} & a_{32} & a_{32} & a_{32} & a_{32} \\ j_{32} & k_{32} & l_{32} & m_{32} & - j_{32} & - k_{32} & - l_{32} & - m_{32} \\ n_{32} & o_{32} & - n_{32} & - o_{32} & n_{32} & o_{32} & - n_{32} & o_{32} \\ k_{32} & - j_{32} & - m_{32} & l_{32} & - k_{32} & j_{32} & m_{32} & - l_{32} \\ p_{32} & - p_{32} & p_{32} & - p_{32} & p_{32} & - p_{32} & p_{32} & - p_{32} \\ l_{32} & - m_{32} & - j_{32} & k_{32} & - l_{32} & m_{32} & j_{32} & - k_{32} \\ o_{32} & n_{32} & - o_{32} & - n_{32} & o_{32} & n_{32} & - o_{32} & - n_{32} \\ m_{32} & l_{32} & k_{32} & j_{32} & - m_{32} & - l_{32} & - k_{32} & - j_{32} \end{matrix}],

and

B_{8} = [\begin{matrix} b_{32} & c_{32} & d_{32} & e_{32} & f_{32} & g_{32} & h_{32} & i_{32} \\ c_{32} & f_{32} & i_{32} & - d_{32} & - g_{32} & b_{32} & e_{32} & h_{32} \\ d_{32} & i_{32} & - f_{32} & c_{32} & h_{32} & - e_{32} & b_{32} & g_{32} \\ e_{32} & - d_{32} & c_{32} & - b_{32} & - i_{32} & h_{32} & - g_{32} & f_{32} \\ f_{32} & - g_{32} & h_{32} & - i_{32} & - b_{32} & c_{32} & - d_{32} & e_{32} \\ g_{32} & b_{32} & - e_{32} & h_{32} & c_{32} & - f_{32} & i_{32} & d_{32} \\ h_{32} & e_{32} & b_{32} & - g_{32} & - d_{32} & i_{32} & f_{32} & c_{32} \\ i_{32} & h_{32} & g_{32} & f_{32} & e_{32} & d_{32} & c_{32} & b_{32} \end{matrix}],

where

a_{32} = 1,

b_{32} = 0.9808 - j 0.1951, c_{32} = 0.8315 - j 0.5556,

d_{32} = 0.5556 - j 0.8315, e_{32} = 0.1951 - j 0.9808,

f_{32} = - 0.1951 - j 0.9808, g_{32} = - 0.5556 - j 0.8315,

h_{32} = - 0.8315 - j 0.5556, i_{32} = - 0.9808 - j 0.1951,

j_{32} = 0.9239 - j 0.3827, k_{32} = 0.3827 - j 0.9239,

l_{32} = - 0.3827 - j 0.9239, m_{32} = - 0.9239 - j 0.3827,

n_{32} = 0.7071 - j 0.7071, o_{32} = - 0.7071 - j 0.7071,

p_{32} = - j,

Taking into account the above factorization scheme, we can finally write

Y_{32 \times 1} = W_{32}^{(0)} {\tilde{W}}_{32}^{(0)} (E_{8} \oplus Q_{8} \oplus A_{8} \oplus B_{8}) {\tilde{W}}_{32}^{(1)} P_{32}^{(π_{32})} X_{32 \times 1}

(21)

where

{\overset{ˇ}{P}}_{8 \times 4} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}], {\hat{P}}_{8 \times 4} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}],

{\overset{ˇ}{P}}_{16 \times 8} = {\overset{ˇ}{P}}_{8 \times 4} \oplus {\overset{ˇ}{P}}_{8 \times 4}, {\hat{P}}_{16 \times 8} = {\hat{P}}_{8 \times 4} \oplus {\hat{P}}_{8 \times 4},

{\dot{P}}_{16} = [\begin{matrix} {\overset{ˇ}{P}}_{16 \times 8} & {\hat{P}}_{16 \times 8} \end{matrix}],

{\tilde{W}}_{32}^{(0)} = W_{16}^{(0)} \oplus {\dot{P}}_{16}, {\tilde{W}}_{32}^{(1)} = {\dot{P}}_{16} \oplus W_{16}^{(0)}

E_{8}

and

Q_{8}

are the same as in the algorithm for N = 16, so we will skip this part.

A_{8}

and

B_{8}

are a new matrix from the bottom half of the algorithm for N = 32. Permute rows of the matrix

A_{8}

according to permutation

π_{8}

. As a result of such a permutation, we obtain the matrix

{\tilde{A}}_{8} = [\begin{matrix} F_{4} & F_{4} \\ G_{4} & - G_{4} \end{matrix}] = A_{8} P_{8}^{(π_{8})}

where

F_{4} = [\begin{matrix} a_{32} & a_{32} & a_{32} & a_{32} \\ n_{32} & o_{32} & - n_{32} & - o_{32} \\ p_{32} & - p_{32} & p_{32} & - p_{32} \\ o_{32} & n_{32} & - o_{32} & - n_{32} \end{matrix}] and G_{4} = [\begin{matrix} j_{32} & k_{32} & l_{32} & m_{32} \\ k_{32} & - j_{32} & - m_{32} & l_{32} \\ l_{32} & - m_{32} & - j_{32} & k_{32} \\ m_{32} & l_{32} & k_{32} & j_{32} \end{matrix}]

Let us now define the permutation

π_{8}^{(2)}

in the following form:

π_{8}^{(2)} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 1 & 2 & 3 & 4 & 8 & 7 & 6 & 5 \end{matrix}) .

Permutation

π_{8}^{(2)}

can be written as a matrix in this way:

P_{8}^{(π_{8}^{(2)})} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

Then, we permute rows and columns of the matrix

B_{8}

according to permutation

π_{8}^{(2)}

. As a result of such a permutation, we obtain the matrix

{\tilde{B}}_{8} = [\begin{matrix} J_{4} & K_{4} \\ K_{4} & J_{4} \end{matrix}] = P_{8}^{(π_{8}^{(2)})} B_{8} P_{8}^{(π_{8}^{(2)})}

where

J_{4} = [\begin{matrix} b_{32} & c_{32} & d_{32} & e_{32} \\ c_{32} & f_{32} & i_{32} & - d_{32} \\ d_{32} & i_{32} & - f_{32} & c_{32} \\ e_{32} & - d_{32} & c_{32} & - b_{32} \end{matrix}] and K_{4} = [\begin{matrix} i_{32} & h_{32} & g_{32} & f_{32} \\ h_{32} & e_{32} & b_{32} & - g_{32} \\ g_{32} & b_{32} & - e_{32} & h_{32} \\ f_{32} & - g_{32} & h_{32} & - i_{32} \end{matrix}]

To reduce the computational complexity for matrix

B_{8}

, we need to perform the following calculations:

{\tilde{B}}_{8} = [\begin{matrix} J_{4} & K_{4} \\ K_{4} & J_{4} \end{matrix}] = H_{8} ({\tilde{J}}_{4} \oplus {\tilde{K}}_{4}) H_{8}

where

{\tilde{J}}_{4} = \frac{1}{2} (J_{4} + K_{4}) = \frac{1}{2} [\begin{matrix} b_{32} + i_{32} & c_{32} + h_{32} & d_{32} + g_{32} & e_{32} + f_{32} \\ c_{32} + h_{32} & f_{32} + e_{32} & i_{32} + b_{32} & - d_{32} + - g_{32} \\ d_{32} + g_{32} & i_{32} + b_{32} & - f_{32} + - e_{32} & c_{32} + h_{32} \\ e_{32} + f_{32} & - d_{32} + - g_{32} & c_{32} + h_{32} & - b_{32} + - i_{32} \end{matrix}]

and

{\tilde{K}}_{4} = \frac{1}{2} (J_{4} - K_{4}) = \frac{1}{2} [\begin{matrix} b_{32} - i_{32} & c_{32} - h_{32} & d_{32} - g_{32} & e_{32} - f_{32} \\ c_{32} - h_{32} & f_{32} - e_{32} & i_{32} - b_{32} & - d_{32} + g_{32} \\ d_{32} - g_{32} & i_{32} - b_{32} & - f_{32} + e_{32} & c_{32} - h_{32} \\ e_{32} - f_{32} & - d_{32} + g_{32} & c_{32} - h_{32} & - b_{32} + i_{32} \end{matrix}] .

Taking into account the above factorization scheme, we can finally write

\begin{matrix} Y_{32 \times 1} = W_{32}^{(0)} {\tilde{W}}_{32}^{(0)} P_{32}^{(3)} W_{32}^{(3)} D_{32} W_{32}^{(4)} P_{32}^{(4)} {\tilde{W}}_{32}^{(1)} P_{32}^{(π_{32})} X_{32 \times 1} \end{matrix}

(22)

where

P_{32}^{(3)} = P_{16}^{(2)} \oplus P_{8}^{(π_{8})} \oplus P_{8}^{(π_{8}^{(2)})}, W_{32}^{(3)} = W_{16}^{(2)} \oplus I_{8} \oplus W_{8}^{(0)},

D_{32} = A_{4} \oplus B_{4} \oplus C_{4} \oplus D_{4} \oplus F_{4} \oplus G_{4} \oplus {\tilde{J}}_{4} \oplus {\tilde{K}}_{4}

W_{32}^{(4)} = W_{16}^{(1)} \oplus W_{8}^{(0)} \oplus W_{8}^{(0)}, P_{32}^{(4)} = P_{16}^{(1)} \oplus I_{8} \oplus P_{8}^{(2)} .

where

P_{16}^{(1)} = P_{8}^{(π_{8})} \oplus I_{8}, P_{16}^{(2)} = I_{8} \oplus {[P_{8}^{(π_{8})}]}^{T},

W_{16}^{(1)} = I_{8} \oplus W_{8}^{(0)}, W_{16}^{(2)} = W_{8}^{(0)} \oplus I_{8},

A_{4}

,

B_{4}

,

C_{4}

and

D_{4}

are the same as in the algorithm for N = 16, so we will skip this part.

F_{4}

,

G_{4}

,

{\tilde{J}}_{4}

and

{\tilde{K}}_{4}

are a new matrix from the bottom half of the algorithm for N = 32. Permute rows of the matrix

F_{4}

according to permutation

π_{4}

. As a result of such a permutation, we obtain the matrix

{\tilde{F}}_{4} = [\begin{matrix} L_{2} & L_{2} \\ M_{2} & - M_{2} \end{matrix}] = F_{4} P_{4}^{(π_{4})}

where

L_{2} = [\begin{matrix} a_{32} & a_{32} \\ p_{32} & - p_{32} \end{matrix}] and M_{2} = [\begin{matrix} n_{32} & o_{32} \\ o_{32} & n_{32} \end{matrix}] .

Permute rows and columns of the matrix

G_{4}

according to permutation

{\tilde{π}}_{4}

. As a result of such a permutation, we obtain the matrix

{\tilde{G}}_{4} = [\begin{matrix} N_{2} & O_{2} \\ O_{2} & N_{2} \end{matrix}] = P_{4}^{({\tilde{π}}_{4})} G_{4} P_{4}^{({\tilde{π}}_{4})}

where

N_{2} = [\begin{matrix} j_{32} & k_{32} \\ k_{32} & - j_{32} \end{matrix}] and O_{2} = [\begin{matrix} m_{32} & l_{32} \\ l_{32} & - m_{32} \end{matrix}] .

To reduce the computational complexity for matrix

G_{4}

, we need to perform the following calculations:

{\tilde{G}}_{4} = [\begin{matrix} N_{2} & O_{2} \\ O_{2} & N_{2} \end{matrix}] = H_{4} ({\tilde{N}}_{2} \oplus {\tilde{O}}_{2}) H_{4}

where

{\tilde{N}}_{2} = \frac{1}{2} (N_{2} + O_{2}) = \frac{1}{2} [\begin{matrix} j_{32} + m_{32} & k_{32} + l_{32} \\ k_{32} + l_{32} & - j_{32} - m_{32} \end{matrix}]

and

{\tilde{O}}_{2} = \frac{1}{2} (N_{2} - O_{2}) = \frac{1}{2} [\begin{matrix} j_{32} - m_{32} & k_{32} - l_{32} \\ k_{32} - l_{32} & - j_{32} + m_{32} \end{matrix}]

Let us define the permutations

π_{4}^{(1)}

and

π_{4}^{(2)}

in the following form:

π_{4}^{(1)} = (\begin{matrix} 1 & 2 & 3 & 4 \\ 3 & 2 & 1 & 4 \end{matrix}) and π_{4}^{(2)} = (\begin{matrix} 1 & 2 & 3 & 4 \\ 1 & 4 & 3 & 2 \end{matrix}) .

Permutations

π_{4}^{(1)}

and

π_{4}^{(2)}

can be written as matrices in this way:

P_{4}^{(π_{4}^{(1)})} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}] and P_{4}^{(π_{4}^{(2)})} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

Permute rows and columns of the matrix

{\tilde{J}}_{4}

according to permutation

π_{4}^{(1)}

for rows and

π_{4}^{(2)}

for columns. As a result of such a permutation, we obtain the matrix

{\dot{J}}_{4} = [\begin{matrix} P_{2} & R_{2} \\ S_{2} & P_{2} \end{matrix}] = P_{4}^{({\tilde{π}}_{4})} {\tilde{J}}_{4} P_{4}^{({\tilde{π}}_{4})}

where

P_{2} = [\begin{matrix} d_{32} + g_{32} & c_{32} + h_{32} \\ c_{32} + h_{32} & - d_{32} + - g_{32} \end{matrix}], R_{2} = [\begin{matrix} - f_{32} + - e_{32} & b_{32} + i_{32} \\ b_{32} + i_{32} & e_{32} + f_{32} \end{matrix}],

S_{2} = [\begin{matrix} b_{32} + i_{32} & e_{32} + f_{32} \\ e_{32} + f_{32} & - b_{32} + - i_{32} \end{matrix}] .

To reduce the computational complexity for matrix

{\dot{J}}_{4}

, we need to perform the following calculations:

{\tilde{J}}_{4} = (T_{2 \times 3}^{(1)} \otimes H_{2}) [\begin{matrix} S_{2} - P_{2} \\ R_{2} - P_{2} \\ P_{2} \end{matrix}] (T_{3 \times 2} \otimes H_{2})

where

T_{2 \times 3}^{(1)} = [\begin{matrix} 0 & 1 & 1 \\ 1 & 0 & 1 \end{matrix}] .

The same permutations as on matrix

{\tilde{J}}_{4}

are applied to matrix

{\tilde{K}}_{4}

. As a result of such permutations, we obtain the matrix

{\dot{K}}_{4} = [\begin{matrix} T_{2} & U_{2} \\ W_{2} & T_{2} \end{matrix}] = P_{4}^{({\tilde{π}}_{4})} {\tilde{K}}_{4} P_{4}^{({\tilde{π}}_{4})}

where

T_{2} = [\begin{matrix} d_{32} - g_{32} & c_{32} - h_{32} \\ c_{32} - h_{32} & - d_{32} + g_{32} \end{matrix}], U_{2} = [\begin{matrix} e_{32} - f_{32} & i_{32} - b_{32} \\ i_{32} - b_{32} & f_{32} - e_{32} \end{matrix}],

W_{2} = [\begin{matrix} b_{32} - i_{32} & e_{32} - f_{32} \\ e_{32} - f_{32} & i_{32} - b_{32} \end{matrix}] .

To reduce the computational complexity for matrix

{\dot{K}}_{4}

, we need to perform the following calculations:

{\tilde{K}}_{4} = (T_{2 \times 3}^{(1)} \otimes H_{2}) [\begin{matrix} W_{2} - T_{2} \\ U_{2} - T_{2} \\ T_{2} \end{matrix}] (T_{3 \times 2} \otimes H_{2}) .

Taking into account the above factorization scheme, we can finally write

Y_{32 \times 1} = W_{32}^{(0)} {\tilde{W}}_{32}^{(0)} P_{32}^{(3)} W_{32}^{(3)} P_{32}^{(5)} W_{32 \times 36} D_{36} W_{36 \times 32} P_{32}^{(6)} W_{32}^{(4)} P_{32}^{(4)} {\tilde{W}}_{32}^{(1)} P_{32}^{(π_{32})} X_{32 \times 1}

(23)

where

P_{32}^{(5)} = P_{16}^{(4)} \oplus P_{4}^{(π_{4})} \oplus P_{4}^{({\tilde{π}}_{4})} \oplus P_{4}^{(π_{4}^{(1)})} \oplus P_{4}^{(π_{4}^{(1)})},

W_{32 \times 36} = W_{16}^{(4)} \oplus I_{4} \oplus W_{4}^{(0)} \oplus (T_{2 \times 3}^{(1)} \otimes H_{2}) \oplus (T_{2 \times 3}^{(1)} \otimes H_{2}),

D_{36} = D_{18}^{(1)} \oplus D_{18}^{(2)},

D_{18}^{(1)} = A_{2} \oplus B_{2} \oplus C_{2} \oplus D_{2} \oplus F_{2} \oplus G_{2} \oplus J_{2} \oplus K_{2},

D_{18}^{(2)} = L_{2} \oplus M_{2} \oplus N_{2} \oplus O_{2} \oplus P_{2} \oplus R_{2} \oplus S_{2} \oplus T_{2} \oplus U_{2} \oplus W_{2},

W_{36 \times 32} = W_{16}^{(3)} \oplus W_{4}^{(0)} \oplus W_{4}^{(0)} \oplus (T_{3 \times 2} \otimes H_{2}) \oplus (T_{3 \times 2} \otimes H_{2}),

P_{32}^{(6)} = P_{16}^{(3)} \oplus I_{4} \oplus P_{4}^{({\tilde{π}}_{4})} \oplus P_{4}^{(π_{4}^{(2)})} \oplus P_{4}^{(π_{4}^{(2)})} .

In turn, the matrices

A_{2}

,

B_{2}

,

C_{2}

,

D_{2}

,

F_{2}

,

G_{2}

,

J_{2}

and

K_{2}

are the same as in the algorithm for N = 16, so we will skip this part. The matrices

L_{2}

,

M_{2}

,

N_{2}

,

O_{2}

,

P_{2}

,

R_{2}

,

S_{2}

,

T_{2}

,

U_{2}

and

W_{2}

also have structures that provide effective factorization, which leads to a decrease in the multiplicative complexity of calculations:

L_{2} = [\begin{matrix} a_{32} & a_{32} \\ p_{32} & - p_{32} \end{matrix}] = (a_{32} \oplus p_{32}) H_{2},

M_{2} = [\begin{matrix} n_{32} & o_{32} \\ o_{32} & n_{32} \end{matrix}] = H_{2} \frac{1}{2} [(n_{32} + o_{32}) \oplus (n_{32} - o_{32})] H_{2},

\begin{matrix} N_{2} = \frac{1}{2} [\begin{matrix} j_{32} + m_{32} & k_{32} + l_{32} \\ k_{32} + l_{32} & - j_{32} - m_{32} \end{matrix}] = [\begin{matrix} - j 0.3827 & - j 0.9239 \\ - j 0.9239 & j 0.3827 \end{matrix}] = [\begin{matrix} - n_{32}^{(0)} & n_{32}^{(1)} \\ n_{32}^{(1)} & n_{32}^{(0)} \end{matrix}] = \\ = T_{2 \times 3} [(- n_{32}^{(0)} - n_{32}^{(1)}) \oplus [- (- n_{32}^{(0)} + n_{32}^{(1)})] \oplus n_{32}^{(1)}] T_{3 \times 2} \end{matrix}

\begin{matrix} O_{2} = \frac{1}{2} [\begin{matrix} j_{32} - m_{32} & k_{32} - l_{32} \\ k_{32} - l_{32} & - j_{32} + m_{32} \end{matrix}] = [\begin{matrix} 0.9239 & 0.3827 \\ 0.3827 & - 0.9239 \end{matrix}] = [\begin{matrix} o_{32}^{(0)} & o_{32}^{(1)} \\ o_{32}^{(1)} & - o_{32}^{(0)} \end{matrix}] = \\ = T_{2 \times 3} [(o_{32}^{(0)} - o_{32}^{(1)}) \oplus [- (o_{32}^{(0)} + o_{32}^{(1)})] \oplus o_{32}^{(1)}] T_{3 \times 2} \end{matrix}

\begin{matrix} P_{2} = \frac{1}{2} [\begin{matrix} b_{32} + i_{32} - (d_{32} + g_{32}) & e_{32} + f_{32} - (c_{32} + h_{32}) \\ e_{32} + f_{32} - (c_{32} + h_{32}) & - b_{32} - i_{32} + (d_{32} + g_{32}) \end{matrix}] = [\begin{matrix} j 0.6364 & j 0.4252 \\ j 0.4252 & - j 0.6364 \end{matrix}] = \\ = [\begin{matrix} p_{32}^{(0)} & p_{32}^{(1)} \\ p_{32}^{(1)} & - p_{32}^{(0)} \end{matrix}] = T_{2 \times 3} [(p_{32}^{(0)} - p_{32}^{(1)}) \oplus [- (p_{32}^{(0)} + p_{32}^{(1)})] \oplus p_{32}^{(1)}] T_{3 \times 2} \end{matrix}

\begin{matrix} R_{2} = \frac{1}{2} [\begin{matrix} - f_{32} - e_{32} - (d_{32} + g_{32}) & b_{32} + i_{32} - (c_{32} + h_{32}) \\ b_{32} + i_{32} - (c_{32} + h_{32}) & f_{32} + e_{32} + (d_{32} + g_{32})) \end{matrix}] = [\begin{matrix} j 1.8123 & j 0.3605 \\ j 0.3605 & - j 1.8123 \end{matrix}] = \\ = [\begin{matrix} r_{32}^{(0)} & r_{32}^{(1)} \\ r_{32}^{(1)} & - r_{32}^{(0)} \end{matrix}] = T_{2 \times 3} [(r_{32}^{(0)} - r_{32}^{(1)}) \oplus [- (r_{32}^{(0)} + r_{32}^{(1)})] \oplus r_{32}^{(1)}] T_{3 \times 2} \end{matrix}

\begin{matrix} S_{2} = \frac{1}{2} [\begin{matrix} d_{32} + g_{32} & c_{32} + h_{32} \\ c_{32} + h_{32} & - d_{32} - g_{32} \end{matrix}] = [\begin{matrix} j 0.8315 & - j 0.5556 \\ - j 0.5556 & - j 0.8315 \end{matrix}] = [\begin{matrix} s_{32}^{(0)} & s_{32}^{(1)} \\ s_{32}^{(1)} & - s_{32}^{(0)} \end{matrix}] = \\ = T_{2 \times 3} [(s_{32}^{(0)} - s_{32}^{(1)}) \oplus [- (s_{32}^{(0)} + s_{32}^{(1)})] \oplus s_{32}^{(1)}] T_{3 \times 2} \end{matrix}

\begin{matrix} T_{2} = \frac{1}{2} [\begin{matrix} b_{32} - i_{32} - (d_{32} - g_{32}) & e_{32} - f_{32} - (c_{32} - h_{32}) \\ e_{32} - f_{32} - (c_{32} - h_{32}) & - b_{32} + i_{32} + (d_{32} - g_{32}) \end{matrix}] = [\begin{matrix} 0.4252 & - 0.6364 \\ - 0.6364 & - 0.4252 \end{matrix}] = \\ = [\begin{matrix} t_{32}^{(0)} & t_{32}^{(1)} \\ t_{32}^{(1)} & - t_{32}^{(0)} \end{matrix}] = T_{2 \times 3} [(t_{32}^{(0)} - t_{32}^{(1)}) \oplus [- (t_{32}^{(0)} + t_{32}^{(1)})] \oplus t_{32}^{(1)}] T_{3 \times 2} \end{matrix}

\begin{matrix} U_{2} = \frac{1}{2} [\begin{matrix} e_{32} - f_{32} - (d_{32} - g_{32}) & i_{32} - b_{32} - (c_{32} - h_{32}) \\ i_{32} - b_{32} - (c_{32} - h_{32}) & - e_{32} + f_{32} + (d_{32} - g_{32}) \end{matrix}] = [\begin{matrix} - 0.3605 & - 1.8123 \\ - 1.8123 & 0.3605 \end{matrix}] = \\ = [\begin{matrix} u_{32}^{(0)} & u_{32}^{(1)} \\ u_{32}^{(1)} & - u_{32}^{(0)} \end{matrix}] = T_{2 \times 3} [(u_{32}^{(0)} - u_{32}^{(1)}) \oplus [- (u_{32}^{(0)} + u_{32}^{(1)})] \oplus u_{32}^{(1)}] T_{3 \times 2} \end{matrix}

\begin{matrix} W_{2} = \frac{1}{2} [\begin{matrix} d_{32} - g_{32} & c_{32} - h_{32} \\ c_{32} - h_{32} & - d_{32} + g_{32} \end{matrix}] = [\begin{matrix} 0.5556 & 0.8315 \\ 0.8315 & - 0.5556 \end{matrix}] = [\begin{matrix} w_{32}^{(0)} & w_{32}^{(1)} \\ w_{32}^{(1)} & - w_{32}^{(0)} \end{matrix}] = \\ = T_{2 \times 3} [(w_{32}^{(0)} - w_{32}^{(1)}) \oplus [- (w_{32}^{(0)} + w_{32}^{(1)})] \oplus w_{32}^{(1)}] T_{3 \times 2} . \end{matrix}

Taking into account the above factorization scheme, we can finally write

\begin{matrix} Y_{32 \times 1} = W_{32}^{(0)} {\tilde{W}}_{32}^{(0)} P_{32}^{(3)} W_{32}^{(3)} P_{32}^{(5)} W_{32 \times 36} W_{36}^{(0)} P_{36 \times 46} D_{46} \times \\ \times P_{46 \times 36} W_{36}^{(1)} W_{36 \times 32} P_{32}^{(6)} W_{32}^{(4)} P_{32}^{(4)} {\tilde{W}}_{32}^{(1)} P_{32}^{(π_{32})} X_{32 \times 1} \end{matrix}

(24)

where

W_{36}^{(0)} = W_{16}^{(6)} \oplus I_{2} \oplus H_{2} \oplus I_{16}, P_{36 \times 46} = A_{16 \times 18} \oplus I_{4} \oplus (T_{2 \times 3}^{(4)} \otimes I_{8}),

W_{36}^{(1)} = W_{16}^{(5)} \oplus H_{2} \oplus H_{2} \oplus I_{16}, P_{46 \times 36} = A_{18 \times 16} \oplus I_{4} \oplus (T_{3 \times 2}^{(3)} \otimes I_{8}),

and finally

D_{46} = d i a g (φ_{0}, φ_{1}, \dots, φ_{45})

where

φ_{18} = a_{32} = 1, φ_{19} = p_{32} = - j, φ_{20} = \frac{1}{2} (n_{32} + o_{32}) = - j 0.7071,

φ_{21} = \frac{1}{2} (n_{32} - o_{32}) = 0.7071, φ_{22} = \frac{1}{2} (j_{32} + m_{32} - k_{32} - l_{32}) = j 0.5412,

φ_{23} = \frac{1}{2} (- j_{32} - m_{32} - k_{32} - l_{32}) = j 1.3066, φ_{24} = \frac{1}{2} (k_{32} + l_{32}) = j 0.9239,

φ_{25} = \frac{1}{2} (j_{32} - m_{32} - k_{32} + l_{32}) = 0.5412, φ_{26} = \frac{1}{2} (- j_{32} + m_{32} - k_{32} + l_{32}) = - 1.3066,

φ_{27} = \frac{1}{2} (k_{32} - l_{32}) = 0.3827,

φ_{28} = \frac{1}{2} (b_{32} + i_{32} - (d_{32} + g_{32}) - (e_{32} + f_{32} - (c_{32} + h_{32}))) = j 1.0616,

φ_{29} = \frac{1}{2} (- (b_{32} + i_{32} - (d_{32} + g_{32}) + (e_{32} + f_{32} - (c_{32} + h_{32})))) = - j 0.2112,

φ_{30} = \frac{1}{2} (e_{32} + f_{32} - (c_{32} + h_{32})) = - j 0.4252,

φ_{31} = \frac{1}{2} (- f_{32} - e_{32} - (d_{32} + g_{32}) - (b_{32} + i_{32} - (c_{32} + h_{32}))) = j 1.4518,

φ_{32} = \frac{1}{2} (- (- f_{32} - e_{32} - (d_{32} + g_{32}) + (b_{32} + i_{32} - (c_{32} + h_{32})))) = - j 2.1727,

φ_{33} = \frac{1}{2} (b_{32} + i_{32} - (c_{32} + h_{32})) = j 0.3605,

φ_{34} = \frac{1}{2} (d_{32} + g_{32} - (c_{32} + h_{32})) = - j 0.2759,

φ_{35} = \frac{1}{2} (- (d_{32} + g_{32} + c_{32} + h_{32})) = j 1.3870, φ_{36} = \frac{1}{2} (c_{32} + h_{32}) = - j 0.5556,

φ_{37} = \frac{1}{2} (b_{32} - i_{32} - (d_{32} - g_{32}) - (e_{32} - f_{32} - (c_{32} - h_{32}))) = 1.0616,

φ_{38} = \frac{1}{2} (- (b_{32} - i_{32} - (d_{32} - g_{32}) + (e_{32} - f_{32} - (c_{32} - h_{32})))) = 0.2112,

φ_{39} = \frac{1}{2} (e_{32} - f_{32} - (c_{32} - h_{32})) = - 0.6364,

φ_{40} = \frac{1}{2} (e_{32} - f_{32} - (d_{32} - g_{32}) - (i_{32} - b_{32} - (c_{32} - h_{32}))) = 1.4518,

φ_{41} = \frac{1}{2} (- (e_{32} - f_{32} - (d_{32} - g_{32}) + (i_{32} - b_{32} - (c_{32} - h_{32})))) = 2.1727,

φ_{42} = \frac{1}{2} (i_{32} - b_{32} - (c_{32} - h_{32})) = - 1.8123, φ_{43} = \frac{1}{2} (d_{32} - g_{32} - (c_{32} - h_{32})) = - 0.2759,

φ_{44} = \frac{1}{2} (- (d_{32} - g_{32} + (c_{32} - h_{32}))) = - 1.3870, φ_{45} = \frac{1}{2} (c_{32} - h_{32}) = 0.8315 .

Figure 4 shows a data flow graph of a synthesized algorithm for the 32-point DFT. As can be seen, in this case, the algorithm takes 36 multiplications and 244 additions.

5. Conclusions

In this article, we show for the first time a simple, clear and unified approach to the derivation of fast Winograd-like DFT algorithms. The idea of constructing algorithms is based on the application of the method of synthesis of fast algorithms for calculating matrix–vector products described in [19]. The mathematical background for the construction of the described algorithms is the original method of hierarchical factorization of the DFT matrix, which differs from the factorization of this matrix in the case of the Cooley–Tukey FFT. The method of synthesis of algorithms is shown by the examples of the construction of these algorithms for two typical lengths of the initial data sequences: N = 4, N = 8, N = 16 and N = 32. As follows from Figure 1, the upper part of the data flow graph for N = 4, outlined by the dotted line, corresponds to the algorithm for N = 2. In turn, the upper part of the data flow graph for N = 8 (see Figure 2), circled with a dotted line, corresponds to the algorithm for N = 4. The upper part of the data flow graph for N = 16 (Figure 3), circled by a dotted line, corresponds to the algorithm for N = 8. Finally, the upper part of the data flow graph for N = 32 (Figure 4), circled with a dotted line, corresponds to the algorithm for N = 16. It is easy to verify that algorithms for other lengths of sequences that are powers of two can be synthesized in a similar way. Therefore, the described method can be considered as universal.

The advantage of the presented algorithms in comparison with the Cooley–Tukey algorithms is that the critical path in the graph of any of the obtained algorithms contains only one multiplication. If there is more than one multiplication in the critical path of the algorithm, then this will create additional problems for the implementation of computations. As a result of multiplying two n-bit operands, a 2n-bit product is obtained. The need for repeated multiplication requires an additional amount of manipulations with the operands and therefore requires more time and effort than when we are dealing with only a single multiplication. In fixed-point devices, this fact can cause overflow–underflow handling. If we want to preserve the accuracy, then double access to the memory is required both when writing and when reading. Using floating-point arithmetic in this case also creates additional problems related to exponent alignment, mantissa addition, etc. This is what we had in mind when we wrote about additional dignity. Another important advantage of these algorithms over the Cooley–Tukey algorithms is that the multiplications here are either purely real or purely imaginary. Multiplying complex numbers requires three multiplications of real numbers, while multiplying a complex number by a real number requires only two real multiplications. It leads to an additional reduction in the multiplicative complexity of computations. These two advantages are typical of all Winograd-type algorithms.

Author Contributions

Conceptualization, A.C.; methodology, A.C. and M.R.; validation, A.C. and M.R.; formal analysis, A.C. and M.R.; investigation, M.R.; writing—original draft, A.C. and M.R.; writing—review and editing, M.R.; supervision, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank Dorota Majorkowska-Mech for advice and guidance on how to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

McClellan, J.H.; Rader, C.M. Number Theory in Digital Signal Processing; Prentice-Hall Signal Processing Series; Prentice-Hall: Englewood Cliffs, NJ, USA, 1979. [Google Scholar]
Douglas, E.; Rao, K.R. Fast Transforms Algorithms, Analyses, Applications; Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
Elliott, D.F. (Ed.) Handbook of Digital Signal Processing: Engineering Applications; Academic Press: San Diego, CA, USA, 1987. [Google Scholar]
Nussbaumer, H.J. Fast Fourier Transform and Convolution Algorithms; Springer Series in Information Sciences; Springer: Berlin/Heidelberg, Germany, 1982; Volume 2. [Google Scholar] [CrossRef]
Burrus, C.S.; Parks, T.W.; Potts, J.F. DFT/FFT and Convolution Algorithms: Theory and Implementation; Topics in Digital Signal Processing; Wiley: New York, NY, USA, 1984. [Google Scholar]
Blahut, R.E. Fast Algorithms for Digital Signal Processing; Addison-Wesley Pub. Co.: Reading, MA, USA, 1985. [Google Scholar]
Tolimieri, R.; An, M.; Lu, C. Algorithms for Discrete Fourier Transform and Convolution; Signal Processing and Digital Filtering; Springer: New York, NY, USA, 1989. [Google Scholar] [CrossRef]
Smith, W.W.; Smith, J.M. Handbook of Real-Time Fast Fourier Transforms: Algorithms to Product Testing; IEEE Press: New York, NY, USA, 1995. [Google Scholar]
Parhi, K.K. VLSI Digital Signal Processing Systems: Design and implementation; Wiley: New York, NY, USA, 1999. [Google Scholar]
Moon, T.K.; Stirling, W.C. Mathematical Methods and Algorithms for Signal Processing; Prentice Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
Garg, H.K. Digital Signal Processing Algorithms: Number Theory, Convolution, Fast Fourier Transforms, and Applications; Press Computer Engineering Series; CRC Press: Boca Raton, FL, USA, 1998. [Google Scholar]
Bi, G.; Zeng, Y. Transforms and Fast Algorithms for Signal Analysis and Representations; Birkhäuser Boston: Boston, MA, USA, 2004. [Google Scholar] [CrossRef]
Burrus, C.S.; Selesnick, I. Winograd’s Short DFT Algorithms; OpenStax CNX, Rice University: Houston, TX, USA; Available online: http://cnx.org/content/m16333/latest (accessed on 15 March 2022).
Winograd, S. On computing the discrete Fourier transform. Math. Comput. 1978, 32, 175–199. [Google Scholar] [CrossRef]
Silverman, H. An introduction to programming the Winograd Fourier transform algorithm (WFTA). IEEE Trans. Acoust. Speech Signal Process. 1977, 25, 152–165. [Google Scholar] [CrossRef]
Regalia, P.A.; Sanjit, M.K. Kronecker Products, Unitary Matrices and Signal Processing Applications. SIAM Rev. 1989, 31, 586–613. [Google Scholar] [CrossRef]
Steeb, W.H.; Hardy, Y. Matrix Calculus and Kronecker Product: A Practical Approach to Linear and Multilinear Algebra, 2nd ed.; World Scientific: Singapore, 2011. [Google Scholar] [CrossRef]
Beck, A.; Tetruashvili, L. On the Convergence of Block Coordinate Descent Type Methods. SIAM J. Optim. 2013, 23, 2037–2060. [Google Scholar] [CrossRef]
Cariow, A. Strategies for the Synthesis of Fast Algorithms for the Computation of the Matrix-vector Products. J. Signal Process. Theory Appl. 2014. [Google Scholar] [CrossRef]
Andreatto, B.; Cariow, A. Automatic generation of fast algorithms for matrix–vector multiplication. Int. J. Comput. Math. 2018, 95, 626–644. [Google Scholar] [CrossRef]

Figure 1. The data flow graph of the proposed algorithm for computation of 4-point DFT.

Figure 2. The data flow graph of the proposed algorithm for computation of 8-point DFT.

Figure 3. The data flow graph of the proposed algorithm for computation of 16-point DFT.

Figure 4. The data flow graph of the proposed algorithm for computation of 32-point DFT.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Raciborski, M.; Cariow, A. On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two. Electronics 2022, 11, 1342. https://doi.org/10.3390/electronics11091342

AMA Style

Raciborski M, Cariow A. On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two. Electronics. 2022; 11(9):1342. https://doi.org/10.3390/electronics11091342

Chicago/Turabian Style

Raciborski, Mateusz, and Aleksandr Cariow. 2022. "On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two" Electronics 11, no. 9: 1342. https://doi.org/10.3390/electronics11091342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two

Abstract

1. Introduction

2. Preliminary Remarks

3. Short Background

4. Synthesis of the Fast Winograd-Type DFT Algorithms

4.1. Fast DFT Algorithm for N = 4

4.2. Fast DFT Algorithm for N = 8

4.3. Fast DFT Algorithm for N = 16

4.4. Fast DFT Algorithm for N = 32

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI