Next Article in Journal
Finite-Time Neural Network Fault-Tolerant Control for Robotic Manipulators under Multiple Constraints
Next Article in Special Issue
Small-Size Algorithms for the Type-I Discrete Cosine Transform with Reduced Complexity
Previous Article in Journal
Analysis of Defects and Electrical Characteristics of Variable-Temperature Proton-Irradiated 4H-SiC JBS Diodes
Previous Article in Special Issue
Cascaded RLS Adaptive Filters Based on a Kronecker Product Decomposition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two

by
Mateusz Raciborski
*,† and
Aleksandr Cariow
Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, Żołnierska 52, 71-210 Szczecin, Poland
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2022, 11(9), 1342; https://doi.org/10.3390/electronics11091342
Submission received: 16 March 2022 / Revised: 11 April 2022 / Accepted: 20 April 2022 / Published: 23 April 2022
(This article belongs to the Special Issue Efficient Algorithms and Architectures for DSP Applications)

Abstract

:
Winograd’s algorithms are an effective tool for calculating the discrete Fourier transform (DFT). These algorithms described in well-known articles are traditionally represented either with the help of sets of recurrent relations or with the help of products of sparse matrices obtained on the basis of various methods of the DFT matrix factorization. Unfortunately, in the mentioned papers, it is not shown how the described relations were obtained or how the presented factorizations were found. In this paper, we use a simple, understandable and fairly unified approach to the derivation of the Winograd-type DFT algorithms for the cases N = 8, N = 16 and N = 32. It is easy to verify that algorithms for other lengths of sequences that are powers of two can be synthesized in a similar way.

1. Introduction

Winograd’s method for the realization of the discrete Fourier transform (DFT) for several decades has been discussed in a number of publications [1,2,3,4,5,6,7,8,9,10,11,12]. In comparison with the Cooley–Tukey fast Fourier transform (FFT) algorithms, the Winograd DFT algorithm requires substantially fewer multiplications at the cost of a few extra additions. In the known papers, the cases of the Winograd FFTs for small sequences of odd length are mainly considered. Moreover, the algorithms were presented in the form of algebraic relations or in the form of DFT matrix factorizations. However, none of the publications known to us has written on how these relations were obtained or how, on the basis of any considerations, the matrices that make up the corresponding computational procedures were constructed.
In this paper, we want to show a simple, understandable and fairly unified approach to the derivation of Winograd-type FFT algorithms for the cases N = 8, N = 16 and N = 32. It is easy to verify that algorithms for other lengths of sequences that are powers of two can be synthesized similarly.

2. Preliminary Remarks

The discrete Fourier transform (DFT) is one of the most important tools in digital signal and image processing. The DFT can be defined as follows:
y k = n = 0 N 1 x n e j 2 π n k N
where x n ,   n = 0 , 1 , , N 1 is a uniformly sampled sequence, y k ,   k = 0 , 1 , , N 1 is the k-th DFT coefficient, and j = 1 is an imaginary unit.
In vector–matrix notation, we can rewrite (1) in the following form:
Y N × 1 = E N X N × 1
where
X N × 1 = [ x 0 , x 1 , , x N 1 ] T , Y N × 1 = [ y 0 , y 1 , , y N 1 ] T
and
E N = w k n , w k n = e j 2 π n k N , k , n = 0 , 1 , , N 1
Implementation of calculations in accordance with expression (2), especially for large N, requires performing a large number of arithmetic operations, which in turn leads to an increase in computation time.
In 1965, J. Cooley and J. Tukey proposed the fast algorithm to compute discrete Fourier transform with a drastically reduced number of arithmetical operations. Mathematically, the fast Fourier transform algorithms are based on factorization of the Fourier matrix into a product of sparse matrices, meaning matrices with many zero entries. However, this factorization can be implemented in different ways. In the case of the Cooley–Tukey algorithm, we are dealing with the representation of the original matrix as a product of log 2 N sparse structured matrices. As is well known, the complexity of this algorithm is approximately N 2 log 2 N multiplications and the same number of additions of complex numbers.
Another effective algorithm for calculating the DFT is the Winograd FFT algorithm. In comparison with the Cooley–Tukey FFT algorithm, the Winograd FFT algorithm requires substantially fewer multiplications at the cost of a few extra additions. Winograd proved that the multiplicative complexity of FFT algorithms can be significantly reduced by some increase in additive complexity. The Winograd Fourier transform algorithm (WFTA) is an FFT algorithm which achieves a reduction on the number of multiplications from order O ( N 2 ) in the DFT to order N. In the literature known to the authors [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], the Winograd FFT algorithms that implement DFT transform for a limited set of small-length sequences are mainly considered. As a rule, these algorithms are represented as a set of algebraic relations [1,2,3,4,5,8], although the matrix interpretation of Winograd FFT algorithms are available too [6,9]. In the case of the matrix formulation of the Winograd FFT algorithms, the factorization of the DFT matrix differs from the factorization of the DFT matrix in the Cooley–Tukey FFT algorithms. Moreover, the mechanism for deriving such algorithms for each specific case is unique. In addition, methods for the deriving of recurrent relations are not published anywhere. Additionally, the ways for deriving factorized representations of DFT matrices have never been explained. In this paper, we show a simple, understandable and fairly unified approach to the derivation of the Winograd-like FFT algorithms for the case when the input sequence length is a power of two.

3. Short Background

The main idea of the proposed approach is to use a new method for factorizing the DFT matrix, which is different from Winograd factorization. In contrast to Winograd factorization, we propose the following unified method of DFT matrix decomposition:
E 2 i + 1 = H 2 I 2 i E 2 i Q 2 i P 2 i + 1 ( π 2 i + 1 )
where
P 2 i + 1 ( π 2 i + 1 ) = I 2 i 1 Ψ 2 × 4 I 2 i 1 Ψ 2 × 4 I 2 i + 1 1 , Ψ 2 × 4 = 1 0 0 0 0 0 1 0 , i = 1 , 2 , 3
E k is k × k DFT matrix; Q k is some “prefix” matrix containing a constellation of twiddle factors specific for each N; I k is an identity k × k matrix; H 2 is the order 2 Hadamard matrix; I 2 i + 1 1 is the matrix obtained from the k × k identity matrix by shifting its columns by one position to the right; and signs “⊗”, “⊕” denote the tensor product and direct sum of two matrices, respectively [16,17].
Then, the generalized scheme for the synthesis of Winograd-type DFT algorithms for N equal to the power of two can be described as follows:
Y 2 i + 1 × 1 = H 2 I 2 i E 2 i Q 2 i P 2 i + 1 ( π 2 i + 1 ) X 2 i + 1 × 1
The methods for factorizing the matrices E k and Q k are different, but both lead to a factorization of the BCD type [18] similar to the Winograd factorization. Moreover, as follows from expression (1), the expansions for small N are part of the expansions for larger lengths of input sequences. When synthesizing algorithms for separate E k and Q k , we will use the templates of matrix structures and identities presented in [19,20].

4. Synthesis of the Fast Winograd-Type DFT Algorithms

Let us show, based on specific examples, how it works.

4.1. Fast DFT Algorithm for N = 4

As an example, suppose that N = 4. Then (2) can be rewritten as
Y 4 × 1 = E 4 X 4 × 1
where
E 4 = a 4 a 4 a 4 a 4 a 4 b 4 a 4 b 4 a 4 a 4 a 4 a 4 a 4 b 4 a 4 b 4 ,
a 4 = 1 , b 4 = j .
X 4 × 1 = [ x 0 , x 1 , x 2 , x 3 ] T , Y 4 × 1 = [ y 0 , y 1 , y 2 , y 3 ] T .
Let us now define the permutation π 4 and write it as a matrix in this way:
π 4 = 1 2 3 4 1 3 2 4 , P 4 ( π 4 ) = 1 1 1 1 .
Permute the columns of the matrix E 4 according to permutation π 4 . As a result of such a permutation, we obtain the matrix
E ˜ 4 = E 2 Q 2 E 2 Q 2 = E ˜ 4 P 4 ( π 4 )
where
E 2 = a 4 a 4 a 4 a 4 and Q 2 = a 4 a 4 b 4 b 4
According to the concept, Expression (4) for a given transform size can be rewritten as follows:
Y 4 × 1 = H 2 I 2 E 2 Q 2 P 4 ( π 4 ) X 4 × 1
where
H 2 I 2 = 1 1 1 1 1 0 0 1 = 1 1 1 1 1 1 1 1 = W 4 ( 0 ) .
As you can see, after rearranging the columns of the DFT matrix E 4 , it can be decomposed, as follows from the proposed technique, into the order 2 DFT matrix E 2 and the order 2 prefix matrix Q 2 .
For matrices E 2 and Q 2 , we can offer the following factorization schemes leading to a reduction in computational complexity:
E 2 = a 4 a 4 a 4 a 4 = a 4 a 4 H 2 , Q 2 = a 4 a 4 b 4 b 4 = a 4 b 4 H 2
Taking into account the above factorization schemes, we can finally write
Y 4 × 1 = W 4 ( 0 ) D 4 W 4 ( 1 ) P 4 ( π 4 ) X 4 × 1
where
W 4 ( 1 ) = 1 1 1 1 1 1 1 1 = 1 1 1 1 1 1 1 1 ,
D 4 = d i a g φ 0 , φ 1 , φ 2 , φ 3 ,
φ 0 , φ 1 , φ 2 = a 4 = 1 , φ 3 = b 4 = j .
Figure 1 shows a data flow graph of synthesized algorithm for 4 point DFT. As can be seen, in this case, the algorithm takes only eight additions.

4.2. Fast DFT Algorithm for N = 8

As an example, suppose that N = 8. Then (2) can be rewritten as
Y 8 × 1 = E 8 X 8 × 1
where
X 8 × 1 = [ x 0 , x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 ] T ,
Y 8 × 1 = [ y 0 , y 1 , y 2 , y 3 , y 4 , y 5 , y 6 , y 7 ] T ,
a 8 = 1 , b 8 = 0.7071 j 0.7071 , c 8 = j , d 8 = 0.7071 j 0.7071 ,
E 8 = a 8 a 8 a 8 a 8 a 8 a 8 a 8 a 8 a 8 b 8 c 8 d 8 a 8 b 8 c 8 d 8 a 8 c 8 a 8 c 8 a 8 c 8 a 8 c 8 a 8 d 8 c 8 b 8 a 8 d 8 c 8 b 8 a 8 a 8 a 8 a 8 a 8 a 8 a 8 a 8 a 8 b 8 c 8 d 8 a 8 b 8 c 8 d 8 a 8 c 8 a 8 c 8 a 8 c 8 a 8 c 8 a 8 d 8 c 8 b 8 a 8 d 8 c 8 b 8 .
Let us now define the permutation π 8 in the following form:
π 8 = 1 2 3 4 5 6 7 8 1 3 5 7 2 4 6 8 .
Permutation π 8 can be written as a matrix in this way:
P 8 ( π 8 ) = 1 1 1 1 1 1 1 1 .
Permute columns of the matrix E 8 according to permutation π 8 . As a result of such a permutation, we obtain the matrix
E ˜ 8 = E 4 Q 4 E 4 Q 4 = E 8 P 8 ( π 8 )
where
E 4 = a 8 a 8 a 8 a 8 a 8 c 8 a 8 c 8 a 8 a 8 a 8 a 8 a 8 c 8 a 8 c 8 and Q 4 = a 8 a 8 a 8 a 8 b 8 d 8 b 8 d 8 c 8 c 8 c 8 c 8 d 8 b 8 d 8 b 8
According to the concept, Expression (4) for a given transform size can be rewritten as follows:
Y 8 × 1 = H 2 I 4 E 4 Q 4 P 8 ( π 8 ) X 8 × 1
Such a structure of the matrix E ˜ 8 allows to apply a divide and conquer algorithm that recursively breaks down a matrix–vector product of order eight into two smaller matrix–vector products of order 4 [19]. If we write the matrix E 8 as a product E ˜ 8 P 8 ( π 8 ) , the Equation (8) will take the form:
Y 8 × 1 = W 8 ( 0 ) E 4 Q 4 P 8 ( π 8 ) X 8 × 1
where
W 8 ( 0 ) = H 2 I 4 .
Permute columns of the matrix E 4 and rows of the matrix Q 4 according to permutation π 4 . As a result of such permutations, we obtain the matrices
E ˜ 4 = A 2 B 2 A 2 B 2 and Q ˜ 4 = C 2 C 2 D 2 D 2
where
A 2 = a 8 a 8 a 8 a 8 , B 2 = a 8 a 8 c 8 c 8 ,
C 2 = a 8 a 8 c 8 c 8 , D 2 = b 8 d 8 d 8 b 8 .
Such structures of the matrices E ˜ 4 and Q ˜ 4 allow to apply the same schemes of factorization. Therefore, we can write
Y 8 × 1 = W 8 ( 0 ) W ˜ 8 ( 0 ) A 2 B 2 C 2 D 2 W ˜ 8 ( 1 ) P 8 ( π 8 ) X 8 × 1
where
W ˜ 8 ( 0 ) = H 2 I 2 P 4 ( π 4 ) , W ˜ 8 ( 1 ) = P 4 ( π 4 ) H 2 I 2 ,
For matrices A 2 , B 2 , C 2 , D 2 we can offer the following factorization schemes leading to a reduction in computational complexity:
A 2 = a 8 a 8 a 8 a 8 = a 8 a 8 H 2 , B 2 = a 8 a 8 c 8 c 8 = a 8 c 8 H 2 ,
C 2 = a 8 a 8 c 8 c 8 = a 8 c 8 H 2 , D 2 = b 8 d 8 d 8 b 8 = H 2 1 2 b 8 + d 8 b 8 d 8 H 2
Taking into account the above factorization schemes, we can finally write
Y 8 × 1 = W 8 ( 0 ) W ˜ 8 ( 0 ) W 8 ( 3 ) D 8 W 8 ( 4 ) W ˜ 8 ( 1 ) P 8 ( π 8 ) X 8 × 1
where
W 8 ( 4 ) = I 4 H 2 , W 8 ( 3 ) = I 6 H 2 ,
D 8 = d i a g φ 0 , φ 1 , φ 2 , φ 3 , φ 4 , φ 5 , φ 6 , φ 7 ,
φ 0 , φ 1 , φ 2 , φ 4 = a 8 = 1 , φ 3 , φ 5 = e 8 = j , φ 6 = j 0.7071 , φ 7 = 0.7071 .
Expression (12) describes the Winograd-type fast Fourier transform algorithm for N = 8. Figure 2 shows a data flow graph of synthesized algorithm for 8 point DFT. As can be seen, in this case the algorithm takes 2 multiplications and 26 additions.

4.3. Fast DFT Algorithm for N = 16

Now let us consider the synthesis of a similar algorithm for N = 16. In matrix–vector notation, we can rewrite the DFT in the following form:
Y 16 × 1 = E 16 X 16 × 1
where
X 16 × 1 = [ x 0 , x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 , x 9 , x 10 , x 11 , x 12 , x 13 , x 14 , x 15 ] T
Y 16 × 1 = [ y 0 , y 1 , y 2 , y 3 , y 4 , y 5 , y 6 , y 7 , y 8 , y 9 , y 10 , y 11 , y 12 , y 13 , y 14 , y 15 ] T
E 16 = E 8 ( 0 , 0 ) E 8 ( 0 , 1 ) E 8 ( 1 , 0 ) E 8 ( 1 , 1 ) .
where
E 8 ( 0 , 0 ) = a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 b 16 c 16 d 16 e 16 f 16 g 16 h 16 a 16 c 16 e 16 g 16 a 16 c 16 e 16 g 16 a 16 d 16 g 16 b 16 e 16 h 16 c 16 f 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 f 16 c 16 h 16 e 16 b 16 g 16 d 16 a 16 g 16 e 16 c 16 a 16 g 16 e 16 c 16 a 16 h 16 g 16 f 16 e 16 d 16 c 16 b 16 ,
E 8 ( 0 , 1 ) = a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 b 16 c 16 d 16 e 16 f 16 g 16 h 16 a 16 c 16 e 16 g 16 a 16 c 16 e 16 g 16 a 16 d 16 g 16 b 16 e 16 h 16 c 16 f 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 f 16 c 16 h 16 e 16 b 16 g 16 d 16 a 16 g 16 e 16 c 16 a 16 g 16 e 16 c 16 a 16 h 16 g 16 f 16 e 16 d 16 c 16 b 16 ,
E 8 ( 1 , 0 ) = a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 b 16 c 16 d 16 e 16 f 16 g 16 h 16 a 16 c 16 e 16 g 16 a 16 c 16 e 16 g 16 a 16 d 16 g 16 b 16 e 16 h 16 c 16 f 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 f 16 c 16 h 16 e 16 b 16 g 16 d 16 a 16 g 16 e 16 c 16 a 16 g 16 e 16 c 16 a 16 h 16 g 16 f 16 e 16 d 16 c 16 b 16 ,
E 8 ( 1 , 1 ) = a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 b 16 c 16 d 16 e 16 f 16 g 16 h 16 a 16 c 16 e 16 g 16 a 16 c 16 e 16 g 16 a 16 d 16 g 16 b 16 e 16 h 16 c 16 f 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 f 16 c 16 h 16 e 16 b 16 g 16 d 16 a 16 g 16 e 16 c 16 a 16 g 16 e 16 c 16 a 16 h 16 g 16 f 16 e 16 d 16 c 16 b 16 .
where
a 16 = 1 , b 16 = 0.9239 j 0.3827 , c 16 = 0.7071 j 0.7071 ,
d 16 = 0.3827 j 0.9239 , e 16 = j , f 16 = 0.3827 j 0.9239 ,
g 16 = 0.7071 j 0.7071 , h 16 = 0.9239 j 0.3827 .
Let us define the permutation π 16 in the following form:
π 16 = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 3 5 7 9 11 13 15 2 4 6 8 10 12 14 16
Permute columns of the matrix E 16 according to permutation π 16 . As a result of such permutation, we obtain the matrix
E ˜ 16 = E 8 Q 8 E 8 Q 8 = E ˜ 16 P 16 ( π 16 )
where
E 8 = a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 c 16 e 16 g 16 a 16 c 16 e 16 g 16 a 16 e 16 a 16 e 16 a 16 a 16 a 16 e 16 a 16 g 16 e 16 c 16 a 16 g 16 e 16 c 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 c 16 e 16 g 16 a 16 c 16 e 16 g 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 e 16 a 16 g 16 e 16 c 16 a 16 g 16 e 16 c 16 ,
Q 8 = a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 b 16 d 16 f 16 h 16 b 16 d 16 f 16 h 16 c 16 g 16 c 16 g 16 c 16 g 16 c 16 g 16 d 16 b 16 h 16 f 16 d 16 b 16 h 16 f 16 e 16 e 16 e 16 e 16 e 16 e 16 e 16 e 16 f 16 h 16 b 16 d 16 f 16 h 16 b 16 d 16 g 16 c 16 g 16 c 16 g 16 c 16 g 16 c 16 h 16 f 16 d 16 b 16 h 16 f 16 d 16 b 16 .
According to the concept, expression (4) for a given transform size can be rewritten as follows:
Y 16 × 1 = H 2 I 8 E 8 Q 8 P 16 ( π 16 ) X 16 × 1
Such a matrix structure allows for the factorization of the matrix E ˜ 16 ; similarly, as was done in the case of the matrix of order N = 8 [19].
If we write the matrix E 16 as a product E ˜ 16 P 16 ( π 16 ) , Equation (13) takes the form
Y 16 × 1 = W 16 ( 0 ) E 8 Q 8 P 16 ( π 16 ) X 16 × 1
where the corresponding permutation matrix P 16 ( π 16 ) takes the following form:
P ˇ 4 × 8 = 1 1 1 1 , P ^ 4 × 8 = 1 1 1 1 ,
P ˇ 8 × 16 = P ˇ 4 × 8 P ˇ 4 × 8 , P ^ 8 × 16 = P ^ 4 × 8 P ^ 4 × 8 ,
P 16 ( π 16 ) = P 16 ( 0 ) = P ˇ 8 × 16 P ^ 8 × 16 and W 16 ( 0 ) = H 2 I 8 .
Now let us permute columns of the matrix E 8 according to permutation π 8 . As a result of such a permutation, we obtain the matrix
E ˜ 8 = a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 e 16 a 16 e 16 e 16 g 16 c 16 g 16 a 16 a 16 a 16 a 16 e 16 e 16 e 16 e 16 a 16 e 16 a 16 e 16 g 16 c 16 g 16 c 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 e 16 a 16 e 16 c 16 g 16 e 16 g 16 a 16 a 16 a 16 a 16 e 16 e 16 e 16 e 16 a 16 e 16 a 16 e 16 g 16 c 16 g 16 c 16 = A 4 B 4 A 4 B 4 ,
where
A 4 = a 16 a 16 a 16 a 16 a 16 e 16 a 16 e 16 a 16 a 16 a 16 a 16 a 16 e 16 a 16 e 16 and B 4 = a 16 a 16 a 16 a 16 c 16 g 16 c 16 g 16 e 16 e 16 e 16 e 16 g 16 c 16 g 16 c 16 .
Then the matrix E 8 can be represented as a product E ˜ 8 P 8 ( π 8 ) = W 8 ( 0 ) A 4 B 4 P 8 ( π 8 ) . Next, we permute rows of the matrix Q 8 according to permutation π 8 . As a result of such a permutation, we obtain the matrix Q ˜ 8
Q ˜ 8 = a 16 a 16 a 16 a 16 a 16 a 16 a 16 a 16 c 16 g 16 c 16 g 16 c 16 g 16 c 16 g 16 e 16 e 16 e 16 e 16 e 16 e 16 e 16 e 16 g 16 c 16 g 16 c 16 g 16 c 16 g 16 c 16 b 16 d 16 f 16 h 16 b 16 d 16 f 16 h 16 d 16 b 16 h 16 f 16 d 16 b 16 h 16 f 16 f 16 h 16 b 16 d 16 f 16 h 16 b 16 d 16 h 16 f 16 d 16 b 16 h 16 f 16 d 16 b 16 = C 4 C 4 D 4 D 4 ,
where
C 4 = a 16 a 16 a 16 a 16 c 16 g 16 c 16 g 16 e 16 e 16 e 16 e 16 g 16 c 16 g 16 c 16 and D 4 = b 16 d 16 f 16 h 16 d 16 b 16 h 16 f 16 f 16 h 16 b 16 d 16 h 16 f 16 d 16 b 16 .
Then the matrix Q 8 can be represented as a product P 8 ( π 8 ) Q ˜ 8 = P 8 ( π 8 ) C 4 D 4 W 8 ( 0 ) . Taking into account the above factorization schemes, we can finally write
Y 16 × 1 = W 16 ( 0 ) W ˜ 16 ( 0 ) A 4 B 4 C 4 D 4 W ˜ 16 ( 1 ) P 16 ( 0 ) X 16 × 1
where
W ˜ 16 ( 0 ) = W 8 ( 0 ) P 8 ( π 8 ) T , W ˜ 16 ( 1 ) = P 8 ( π 8 ) W 8 ( 0 ) .
We now consider the matrices A 4 , B 4 , C 4 , and D 4 . Permute columns of the matrix A 4 according to permutation π 4 . As a result of such a permutation, we obtain the matrix
A ˜ 4 = a 16 a 16 a 16 a 16 a 16 a 16 e 16 e 16 a 16 a 16 a 16 a 16 a 16 a 16 e 16 e 16 = A 2 B 2 A 2 B 2 .
Next, we permute rows of the matrix B 4 according to permutation π 4 . As a result of such a permutation, we obtain the matrix B ˜ 4
B ˜ 4 = a 16 a 16 a 16 a 16 e 16 e 16 e 16 e 16 c 16 g 16 c 16 g 16 g 16 c 16 g 16 c 16 = C 2 C 2 D 2 D 2 .
Now, we permute rows of the matrix C 4 according to permutation π 4 . As a result of such a permutation, we obtain the matrix C ˜ 4
C ˜ 4 = a 16 a 16 a 16 a 16 e 16 e 16 e 16 e 16 c 16 g 16 c 16 g 16 g 16 c 16 g 16 c 16 = F 2 F 2 G 2 G 2 .
Next, we define the permutation in the following way:
π ˜ 4 = 1 2 3 4 1 2 4 3
Permute rows and columns of the matrix D 4 according to permutation π ˜ 4 . As a result of such a permutation, we obtain the matrix D ˜ 4 .
D ˜ 4 = b 16 d 16 h 16 f 16 d 16 b 16 f 16 h 16 h 16 f 16 b 16 d 16 f 16 h 16 d 16 b 16 = P 4 ( π ˜ 4 ) D 4 P 4 ( π ˜ 4 ) = J ˜ 2 K ˜ 2 K ˜ 2 J ˜ 2
where
P 4 ( π ˜ 4 ) = 1 1 1 1 .
Then
D 4 = P 4 ( π ˜ 4 ) H 2 I 2 1 2 J ˜ 2 + K ˜ 2 J ˜ 2 K ˜ 2 H 2 I 2 P 4 ( π ˜ 4 )
where
1 2 J ˜ 2 + K ˜ 2 = 1 2 b 16 + h 16 d 16 + f 16 d 16 + f 16 ( b 16 + h 16 ) = J 2 ,
1 2 J ˜ 2 K ˜ 2 = 1 2 b 16 h 16 d 16 f 16 d 16 f 16 ( b 16 h 16 ) = K 2 .
Taking into account the matrix transformations performed above, we can write
Y 16 × 1 = W 16 ( 0 ) W ˜ 16 ( 0 ) W 16 ( 4 ) P 16 ( 4 ) D 16 P 16 ( 3 ) W 16 ( 3 ) W ˜ 16 ( 1 ) P 16 ( 0 ) X 16 × 1
where
D 16 = A 2 B 2 C 2 D 2 F 2 G 2 J 2 K 2 ,
P 16 ( 4 ) = I 4 P 4 ( π 4 ) P 4 ( π 4 ) P 4 ( π ˜ 4 ) , W 16 ( 4 ) = W 4 ( 0 ) I 8 W 4 ( 0 ) ,
W 16 ( 3 ) = I 4 W 4 ( 0 ) W 4 ( 0 ) W 4 ( 0 ) , P 16 ( 3 ) = P 4 ( π 4 ) I 8 P 4 ( π ˜ 4 ) ,
A 2 = a 16 a 16 a 16 a 16 , B 2 = a 16 a 16 e 16 e 16 ,
C 2 = F 2 = a 16 a 16 e 16 e 16 , D 2 = G 2 = c 16 g 16 g 16 c 16 ,
J 2 = 1 2 b 16 + h 16 d 16 + f 16 d 16 + f 16 ( b 16 + h 16 ) , K 2 = 1 2 b 16 h 16 d 16 f 16 d 16 f 16 ( b 16 h 16 ) .
In turn, the matrices A 2 , B 2 , C 2 , D 2 , F 2 , G 2 , J 2 and K 2 also have structures that provide effective factorization, which leads to a decrease in the multiplicative complexity of calculations:
A 2 = a 16 a 16 a 16 a 16 = a 16 a 16 H 2 ,
B 2 = a 16 a 16 e 16 e 16 = a 16 e 16 H 2 ,
C 2 = F 2 a 16 a 16 e 16 e 16 = a 16 e 16 H 2 ,
D 2 = G 2 c 16 g 16 g 16 c 16 = H 2 1 2 c 16 + g 16 c 16 g 16 H 2 ,
J 2 = 1 2 b 16 + h 16 d 16 + f 16 d 16 + f 16 ( b 16 + h 16 ) = T 2 × 3 1 2 j 21 0 0 0 j 22 0 0 0 j 23 T 3 × 2 ,
where
T 2 × 3 = 1 0 1 0 1 1 , T 3 × 2 = 1 0 0 1 1 1 ,
j 21 = ( b 16 + h 16 ) ( d 16 + f 16 ) ,
j 22 = ( b 16 + h 16 ) + ( d 16 + f 16 ) ,
j 23 = d 16 + f 16 .
K 2 = 1 2 b 16 h 16 d 16 f 16 d 16 f 16 ( b 16 h 16 ) = T 2 × 3 1 2 k 21 0 0 0 k 22 0 0 0 k 23 T 3 × 2 .
where
k 21 = ( b 16 h 16 ) ( d 16 f 16 ) ,
k 22 = [ ( b 16 h 16 ) + ( d 16 f 16 ) ] ,
k 23 = ( d 16 f 16 ) .
Combining the above partial decompositions in a single procedure, we can rewrite (13) as follows:
Y 16 × 1 = W 16 ( 0 ) W ˜ 16 ( 0 ) W 16 ( 4 ) P 16 ( 4 ) W 16 ( 6 ) A 16 × 18 D 18 A 18 × 16 W 16 ( 5 ) P 16 ( 3 ) W 16 ( 3 ) W ˜ 16 ( 1 ) P 16 ( 0 ) X 16 × 1
where
W 16 ( 5 ) = I 6 H 2 I 4 , W 16 ( 6 ) = I 6 H 2 I 2 H 2 I 4 ,
D 18 = d i a g φ 0 , φ 1 , , φ 17 ,
φ 0 , φ 1 , φ 2 , φ 4 , φ 8 = a 16 = 1 , φ 3 , φ 5 , φ 9 = e 16 = j ,
φ 6 , φ 10 = 1 2 c 16 + g 16 = j 0.7071 , φ 7 , φ 11 = 1 2 c 16 g 16 = 0.7071 ,
φ 12 = 1 2 b 16 + h 16 d 16 + f 16 = j 0.5412 ,
φ 13 = 1 2 b 16 + h 16 d 16 + f 16 = j 1.3066 ,
φ 14 = 1 2 d 16 + f 16 = j 0.9239 ,
φ 15 = 1 2 b 16 h 16 d 16 f 16 = 0.5412 ,
φ 16 = 1 2 b 16 h 16 + d 16 f 16 = 1.3066 ,
φ 17 = 1 2 d 16 f 16 = 0.3827 ,
A 16 × 18 = I 12 T 2 × 3 T 2 × 3 , A 18 × 16 = I 12 T 3 × 2 T 3 × 2
Figure 3 shows a data flow graph of synthesized algorithm for 16 point DFT. As can be seen, in this case the algorithm takes 10 multiplications and 74 additions.

4.4. Fast DFT Algorithm for N = 32

Now let us consider the synthesis of a similar algorithm for N = 32. In matrix–vector notation, we can rewrite the DFT in the following form:
Y 32 × 1 = E 32 X 32 × 1
where
X 32 × 1 = [ x 0 , x 1 , x 2 , , x 29 , x 30 , x 31 ] T , Y 32 × 1 = [ y 0 , y 1 , y 2 , , y 29 , y 30 , y 31 ] T
Let us define the permutation π 32 in the following form:
π 32 = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Permute columns of the matrix E 32 according to permutation π 32 . As a result of such permutation, we obtain the matrix
E ˜ 32 = E 16 Q 16 E 16 Q 16 = E 32 P 32 ( π 32 )
According to the concept, expression (4) for a given transform size can be rewritten as follows:
Y 32 × 1 = W 32 ( 0 ) E 16 Q 16 P 32 ( π 32 ) X 32 × 1
where
W 32 ( 0 ) = H 2 I 16 ,
P 32 ( π 32 ) = P ˇ 16 × 32 P ^ 16 × 32 , P ˇ 16 × 32 = i = 0 3 P ˇ 4 × 8 ( i ) , P ^ 16 × 32 = i = 0 3 P ^ 4 × 8 ( i ) ,
P ˇ 4 × 8 ( i ) = 1 1 1 1 , P ^ 4 × 8 ( i ) = 1 1 1 1 ,
E 16 is the same as in the algorithm for N = 16, so we will skip this part.
Q 16 is a new matrix. Permute rows of the matrix Q 16 according to permutation π 16 . As a result of such a permutation, we obtain the matrix
Q ˜ 16 = A 8 A 8 B 8 B 8 = Q 16 P 16 ( π 16 )
where
A 8 = a 32 a 32 a 32 a 32 a 32 a 32 a 32 a 32 j 32 k 32 l 32 m 32 j 32 k 32 l 32 m 32 n 32 o 32 n 32 o 32 n 32 o 32 n 32 o 32 k 32 j 32 m 32 l 32 k 32 j 32 m 32 l 32 p 32 p 32 p 32 p 32 p 32 p 32 p 32 p 32 l 32 m 32 j 32 k 32 l 32 m 32 j 32 k 32 o 32 n 32 o 32 n 32 o 32 n 32 o 32 n 32 m 32 l 32 k 32 j 32 m 32 l 32 k 32 j 32 ,
and
B 8 = b 32 c 32 d 32 e 32 f 32 g 32 h 32 i 32 c 32 f 32 i 32 d 32 g 32 b 32 e 32 h 32 d 32 i 32 f 32 c 32 h 32 e 32 b 32 g 32 e 32 d 32 c 32 b 32 i 32 h 32 g 32 f 32 f 32 g 32 h 32 i 32 b 32 c 32 d 32 e 32 g 32 b 32 e 32 h 32 c 32 f 32 i 32 d 32 h 32 e 32 b 32 g 32 d 32 i 32 f 32 c 32 i 32 h 32 g 32 f 32 e 32 d 32 c 32 b 32 ,
where
a 32 = 1 ,
b 32 = 0.9808 j 0.1951 , c 32 = 0.8315 j 0.5556 ,
d 32 = 0.5556 j 0.8315 , e 32 = 0.1951 j 0.9808 ,
f 32 = 0.1951 j 0.9808 , g 32 = 0.5556 j 0.8315 ,
h 32 = 0.8315 j 0.5556 , i 32 = 0.9808 j 0.1951 ,
j 32 = 0.9239 j 0.3827 , k 32 = 0.3827 j 0.9239 ,
l 32 = 0.3827 j 0.9239 , m 32 = 0.9239 j 0.3827 ,
n 32 = 0.7071 j 0.7071 , o 32 = 0.7071 j 0.7071 ,
p 32 = j ,
Taking into account the above factorization scheme, we can finally write
Y 32 × 1 = W 32 ( 0 ) W ˜ 32 ( 0 ) E 8 Q 8 A 8 B 8 W ˜ 32 ( 1 ) P 32 ( π 32 ) X 32 × 1
where
P ˇ 8 × 4 = 1 1 1 1 , P ^ 8 × 4 = 1 1 1 1 ,
P ˇ 16 × 8 = P ˇ 8 × 4 P ˇ 8 × 4 , P ^ 16 × 8 = P ^ 8 × 4 P ^ 8 × 4 ,
P ˙ 16 = P ˇ 16 × 8 P ^ 16 × 8 ,
W ˜ 32 ( 0 ) = W 16 ( 0 ) P ˙ 16 , W ˜ 32 ( 1 ) = P ˙ 16 W 16 ( 0 )
E 8 and Q 8 are the same as in the algorithm for N = 16, so we will skip this part.
A 8 and B 8 are a new matrix from the bottom half of the algorithm for N = 32. Permute rows of the matrix A 8 according to permutation π 8 . As a result of such a permutation, we obtain the matrix
A ˜ 8 = F 4 F 4 G 4 G 4 = A 8 P 8 ( π 8 )
where
F 4 = a 32 a 32 a 32 a 32 n 32 o 32 n 32 o 32 p 32 p 32 p 32 p 32 o 32 n 32 o 32 n 32 and G 4 = j 32 k 32 l 32 m 32 k 32 j 32 m 32 l 32 l 32 m 32 j 32 k 32 m 32 l 32 k 32 j 32
Let us now define the permutation π 8 ( 2 ) in the following form:
π 8 ( 2 ) = 1 2 3 4 5 6 7 8 1 2 3 4 8 7 6 5 .
Permutation π 8 ( 2 ) can be written as a matrix in this way:
P 8 ( π 8 ( 2 ) ) = 1 1 1 1 1 1 1 1 .
Then, we permute rows and columns of the matrix B 8 according to permutation π 8 ( 2 ) . As a result of such a permutation, we obtain the matrix
B ˜ 8 = J 4 K 4 K 4 J 4 = P 8 ( π 8 ( 2 ) ) B 8 P 8 ( π 8 ( 2 ) )
where
J 4 = b 32 c 32 d 32 e 32 c 32 f 32 i 32 d 32 d 32 i 32 f 32 c 32 e 32 d 32 c 32 b 32 and K 4 = i 32 h 32 g 32 f 32 h 32 e 32 b 32 g 32 g 32 b 32 e 32 h 32 f 32 g 32 h 32 i 32
To reduce the computational complexity for matrix B 8 , we need to perform the following calculations:
B ˜ 8 = J 4 K 4 K 4 J 4 = H 8 J ˜ 4 K ˜ 4 H 8
where
J ˜ 4 = 1 2 J 4 + K 4 = 1 2 b 32 + i 32 c 32 + h 32 d 32 + g 32 e 32 + f 32 c 32 + h 32 f 32 + e 32 i 32 + b 32 d 32 + g 32 d 32 + g 32 i 32 + b 32 f 32 + e 32 c 32 + h 32 e 32 + f 32 d 32 + g 32 c 32 + h 32 b 32 + i 32
and
K ˜ 4 = 1 2 J 4 K 4 = 1 2 b 32 i 32 c 32 h 32 d 32 g 32 e 32 f 32 c 32 h 32 f 32 e 32 i 32 b 32 d 32 + g 32 d 32 g 32 i 32 b 32 f 32 + e 32 c 32 h 32 e 32 f 32 d 32 + g 32 c 32 h 32 b 32 + i 32 .
Taking into account the above factorization scheme, we can finally write
Y 32 × 1 = W 32 ( 0 ) W ˜ 32 ( 0 ) P 32 ( 3 ) W 32 ( 3 ) D 32 W 32 ( 4 ) P 32 ( 4 ) W ˜ 32 ( 1 ) P 32 ( π 32 ) X 32 × 1
where
P 32 ( 3 ) = P 16 ( 2 ) P 8 ( π 8 ) P 8 ( π 8 ( 2 ) ) , W 32 ( 3 ) = W 16 ( 2 ) I 8 W 8 ( 0 ) ,
D 32 = A 4 B 4 C 4 D 4 F 4 G 4 J ˜ 4 K ˜ 4
W 32 ( 4 ) = W 16 ( 1 ) W 8 ( 0 ) W 8 ( 0 ) , P 32 ( 4 ) = P 16 ( 1 ) I 8 P 8 ( 2 ) .
where
P 16 ( 1 ) = P 8 ( π 8 ) I 8 , P 16 ( 2 ) = I 8 P 8 ( π 8 ) T ,
W 16 ( 1 ) = I 8 W 8 ( 0 ) , W 16 ( 2 ) = W 8 ( 0 ) I 8 ,
A 4 , B 4 , C 4 and D 4 are the same as in the algorithm for N = 16, so we will skip this part. F 4 , G 4 , J ˜ 4 and K ˜ 4 are a new matrix from the bottom half of the algorithm for N = 32. Permute rows of the matrix F 4 according to permutation π 4 . As a result of such a permutation, we obtain the matrix
F ˜ 4 = L 2 L 2 M 2 M 2 = F 4 P 4 ( π 4 )
where
L 2 = a 32 a 32 p 32 p 32 and M 2 = n 32 o 32 o 32 n 32 .
Permute rows and columns of the matrix G 4 according to permutation π ˜ 4 . As a result of such a permutation, we obtain the matrix
G ˜ 4 = N 2 O 2 O 2 N 2 = P 4 ( π ˜ 4 ) G 4 P 4 ( π ˜ 4 )
where
N 2 = j 32 k 32 k 32 j 32 and O 2 = m 32 l 32 l 32 m 32 .
To reduce the computational complexity for matrix G 4 , we need to perform the following calculations:
G ˜ 4 = N 2 O 2 O 2 N 2 = H 4 N ˜ 2 O ˜ 2 H 4
where
N ˜ 2 = 1 2 N 2 + O 2 = 1 2 j 32 + m 32 k 32 + l 32 k 32 + l 32 j 32 m 32
and
O ˜ 2 = 1 2 N 2 O 2 = 1 2 j 32 m 32 k 32 l 32 k 32 l 32 j 32 + m 32
Let us define the permutations π 4 ( 1 ) and π 4 ( 2 ) in the following form:
π 4 ( 1 ) = 1 2 3 4 3 2 1 4 and π 4 ( 2 ) = 1 2 3 4 1 4 3 2 .
Permutations π 4 ( 1 ) and π 4 ( 2 ) can be written as matrices in this way:
P 4 ( π 4 ( 1 ) ) = 1 1 1 1 and P 4 ( π 4 ( 2 ) ) = 1 1 1 1 .
Permute rows and columns of the matrix J ˜ 4 according to permutation π 4 ( 1 ) for rows and π 4 ( 2 ) for columns. As a result of such a permutation, we obtain the matrix
J ˙ 4 = P 2 R 2 S 2 P 2 = P 4 ( π ˜ 4 ) J ˜ 4 P 4 ( π ˜ 4 )
where
P 2 = d 32 + g 32 c 32 + h 32 c 32 + h 32 d 32 + g 32 , R 2 = f 32 + e 32 b 32 + i 32 b 32 + i 32 e 32 + f 32 ,
S 2 = b 32 + i 32 e 32 + f 32 e 32 + f 32 b 32 + i 32 .
To reduce the computational complexity for matrix J ˙ 4 , we need to perform the following calculations:
J ˜ 4 = T 2 × 3 ( 1 ) H 2 S 2 P 2 R 2 P 2 P 2 T 3 × 2 H 2
where
T 2 × 3 ( 1 ) = 0 1 1 1 0 1 .
The same permutations as on matrix J ˜ 4 are applied to matrix K ˜ 4 . As a result of such permutations, we obtain the matrix
K ˙ 4 = T 2 U 2 W 2 T 2 = P 4 ( π ˜ 4 ) K ˜ 4 P 4 ( π ˜ 4 )
where
T 2 = d 32 g 32 c 32 h 32 c 32 h 32 d 32 + g 32 , U 2 = e 32 f 32 i 32 b 32 i 32 b 32 f 32 e 32 ,
W 2 = b 32 i 32 e 32 f 32 e 32 f 32 i 32 b 32 .
To reduce the computational complexity for matrix K ˙ 4 , we need to perform the following calculations:
K ˜ 4 = T 2 × 3 ( 1 ) H 2 W 2 T 2 U 2 T 2 T 2 T 3 × 2 H 2 .
Taking into account the above factorization scheme, we can finally write
Y 32 × 1 = W 32 ( 0 ) W ˜ 32 ( 0 ) P 32 ( 3 ) W 32 ( 3 ) P 32 ( 5 ) W 32 × 36 D 36 W 36 × 32 P 32 ( 6 ) W 32 ( 4 ) P 32 ( 4 ) W ˜ 32 ( 1 ) P 32 ( π 32 ) X 32 × 1
where
P 32 ( 5 ) = P 16 ( 4 ) P 4 ( π 4 ) P 4 ( π ˜ 4 ) P 4 ( π 4 ( 1 ) ) P 4 ( π 4 ( 1 ) ) ,
W 32 × 36 = W 16 ( 4 ) I 4 W 4 ( 0 ) T 2 × 3 ( 1 ) H 2 T 2 × 3 ( 1 ) H 2 ,
D 36 = D 18 ( 1 ) D 18 ( 2 ) ,
D 18 ( 1 ) = A 2 B 2 C 2 D 2 F 2 G 2 J 2 K 2 ,
D 18 ( 2 ) = L 2 M 2 N 2 O 2 P 2 R 2 S 2 T 2 U 2 W 2 ,
W 36 × 32 = W 16 ( 3 ) W 4 ( 0 ) W 4 ( 0 ) T 3 × 2 H 2 T 3 × 2 H 2 ,
P 32 ( 6 ) = P 16 ( 3 ) I 4 P 4 ( π ˜ 4 ) P 4 ( π 4 ( 2 ) ) P 4 ( π 4 ( 2 ) ) .
In turn, the matrices A 2 , B 2 , C 2 , D 2 , F 2 , G 2 , J 2 and K 2 are the same as in the algorithm for N = 16, so we will skip this part. The matrices L 2 , M 2 , N 2 , O 2 , P 2 , R 2 , S 2 , T 2 , U 2 and W 2 also have structures that provide effective factorization, which leads to a decrease in the multiplicative complexity of calculations:
L 2 = a 32 a 32 p 32 p 32 = a 32 p 32 H 2 ,
M 2 = n 32 o 32 o 32 n 32 = H 2 1 2 n 32 + o 32 n 32 o 32 H 2 ,
N 2 = 1 2 j 32 + m 32 k 32 + l 32 k 32 + l 32 j 32 m 32 = j 0.3827 j 0.9239 j 0.9239 j 0.3827 = n 32 ( 0 ) n 32 ( 1 ) n 32 ( 1 ) n 32 ( 0 ) = = T 2 × 3 n 32 ( 0 ) n 32 ( 1 ) n 32 ( 0 ) + n 32 ( 1 ) n 32 ( 1 ) T 3 × 2
O 2 = 1 2 j 32 m 32 k 32 l 32 k 32 l 32 j 32 + m 32 = 0.9239 0.3827 0.3827 0.9239 = o 32 ( 0 ) o 32 ( 1 ) o 32 ( 1 ) o 32 ( 0 ) = = T 2 × 3 o 32 ( 0 ) o 32 ( 1 ) o 32 ( 0 ) + o 32 ( 1 ) o 32 ( 1 ) T 3 × 2
P 2 = 1 2 b 32 + i 32 ( d 32 + g 32 ) e 32 + f 32 ( c 32 + h 32 ) e 32 + f 32 ( c 32 + h 32 ) b 32 i 32 + ( d 32 + g 32 ) = j 0.6364 j 0.4252 j 0.4252 j 0.6364 = = p 32 ( 0 ) p 32 ( 1 ) p 32 ( 1 ) p 32 ( 0 ) = T 2 × 3 p 32 ( 0 ) p 32 ( 1 ) p 32 ( 0 ) + p 32 ( 1 ) p 32 ( 1 ) T 3 × 2
R 2 = 1 2 f 32 e 32 ( d 32 + g 32 ) b 32 + i 32 ( c 32 + h 32 ) b 32 + i 32 ( c 32 + h 32 ) f 32 + e 32 + ( d 32 + g 32 ) ) = j 1.8123 j 0.3605 j 0.3605 j 1.8123 = = r 32 ( 0 ) r 32 ( 1 ) r 32 ( 1 ) r 32 ( 0 ) = T 2 × 3 r 32 ( 0 ) r 32 ( 1 ) r 32 ( 0 ) + r 32 ( 1 ) r 32 ( 1 ) T 3 × 2
S 2 = 1 2 d 32 + g 32 c 32 + h 32 c 32 + h 32 d 32 g 32 = j 0.8315 j 0.5556 j 0.5556 j 0.8315 = s 32 ( 0 ) s 32 ( 1 ) s 32 ( 1 ) s 32 ( 0 ) = = T 2 × 3 s 32 ( 0 ) s 32 ( 1 ) s 32 ( 0 ) + s 32 ( 1 ) s 32 ( 1 ) T 3 × 2
T 2 = 1 2 b 32 i 32 ( d 32 g 32 ) e 32 f 32 ( c 32 h 32 ) e 32 f 32 ( c 32 h 32 ) b 32 + i 32 + ( d 32 g 32 ) = 0.4252 0.6364 0.6364 0.4252 = = t 32 ( 0 ) t 32 ( 1 ) t 32 ( 1 ) t 32 ( 0 ) = T 2 × 3 t 32 ( 0 ) t 32 ( 1 ) t 32 ( 0 ) + t 32 ( 1 ) t 32 ( 1 ) T 3 × 2
U 2 = 1 2 e 32 f 32 ( d 32 g 32 ) i 32 b 32 ( c 32 h 32 ) i 32 b 32 ( c 32 h 32 ) e 32 + f 32 + ( d 32 g 32 ) = 0.3605 1.8123 1.8123 0.3605 = = u 32 ( 0 ) u 32 ( 1 ) u 32 ( 1 ) u 32 ( 0 ) = T 2 × 3 u 32 ( 0 ) u 32 ( 1 ) u 32 ( 0 ) + u 32 ( 1 ) u 32 ( 1 ) T 3 × 2
W 2 = 1 2 d 32 g 32 c 32 h 32 c 32 h 32 d 32 + g 32 = 0.5556 0.8315 0.8315 0.5556 = w 32 ( 0 ) w 32 ( 1 ) w 32 ( 1 ) w 32 ( 0 ) = = T 2 × 3 w 32 ( 0 ) w 32 ( 1 ) w 32 ( 0 ) + w 32 ( 1 ) w 32 ( 1 ) T 3 × 2 .
Taking into account the above factorization scheme, we can finally write
Y 32 × 1 = W 32 ( 0 ) W ˜ 32 ( 0 ) P 32 ( 3 ) W 32 ( 3 ) P 32 ( 5 ) W 32 × 36 W 36 ( 0 ) P 36 × 46 D 46 × × P 46 × 36 W 36 ( 1 ) W 36 × 32 P 32 ( 6 ) W 32 ( 4 ) P 32 ( 4 ) W ˜ 32 ( 1 ) P 32 ( π 32 ) X 32 × 1
where
W 36 ( 0 ) = W 16 ( 6 ) I 2 H 2 I 16 , P 36 × 46 = A 16 × 18 I 4 T 2 × 3 ( 4 ) I 8 ,
W 36 ( 1 ) = W 16 ( 5 ) H 2 H 2 I 16 , P 46 × 36 = A 18 × 16 I 4 T 3 × 2 ( 3 ) I 8 ,
and finally
D 46 = d i a g φ 0 , φ 1 , , φ 45
where
φ 18 = a 32 = 1 , φ 19 = p 32 = j , φ 20 = 1 2 n 32 + o 32 = j 0.7071 ,
φ 21 = 1 2 n 32 o 32 = 0.7071 , φ 22 = 1 2 j 32 + m 32 k 32 l 32 = j 0.5412 ,
φ 23 = 1 2 j 32 m 32 k 32 l 32 = j 1.3066 , φ 24 = 1 2 k 32 + l 32 = j 0.9239 ,
φ 25 = 1 2 j 32 m 32 k 32 + l 32 = 0.5412 , φ 26 = 1 2 j 32 + m 32 k 32 + l 32 = 1.3066 ,
φ 27 = 1 2 k 32 l 32 = 0.3827 ,
φ 28 = 1 2 b 32 + i 32 ( d 32 + g 32 ) ( e 32 + f 32 ( c 32 + h 32 ) ) = j 1.0616 ,
φ 29 = 1 2 ( b 32 + i 32 ( d 32 + g 32 ) + ( e 32 + f 32 ( c 32 + h 32 ) ) ) = j 0.2112 ,
φ 30 = 1 2 e 32 + f 32 ( c 32 + h 32 ) = j 0.4252 ,
φ 31 = 1 2 f 32 e 32 ( d 32 + g 32 ) ( b 32 + i 32 ( c 32 + h 32 ) ) = j 1.4518 ,
φ 32 = 1 2 ( f 32 e 32 ( d 32 + g 32 ) + ( b 32 + i 32 ( c 32 + h 32 ) ) ) = j 2.1727 ,
φ 33 = 1 2 b 32 + i 32 ( c 32 + h 32 ) = j 0.3605 ,
φ 34 = 1 2 d 32 + g 32 ( c 32 + h 32 ) = j 0.2759 ,
φ 35 = 1 2 ( d 32 + g 32 + c 32 + h 32 ) = j 1.3870 , φ 36 = 1 2 c 32 + h 32 = j 0.5556 ,
φ 37 = 1 2 b 32 i 32 ( d 32 g 32 ) ( e 32 f 32 ( c 32 h 32 ) ) = 1.0616 ,
φ 38 = 1 2 ( b 32 i 32 ( d 32 g 32 ) + ( e 32 f 32 ( c 32 h 32 ) ) ) = 0.2112 ,
φ 39 = 1 2 e 32 f 32 ( c 32 h 32 ) = 0.6364 ,
φ 40 = 1 2 e 32 f 32 ( d 32 g 32 ) ( i 32 b 32 ( c 32 h 32 ) ) = 1.4518 ,
φ 41 = 1 2 ( e 32 f 32 ( d 32 g 32 ) + ( i 32 b 32 ( c 32 h 32 ) ) ) = 2.1727 ,
φ 42 = 1 2 i 32 b 32 ( c 32 h 32 ) = 1.8123 , φ 43 = 1 2 d 32 g 32 ( c 32 h 32 ) = 0.2759 ,
φ 44 = 1 2 ( d 32 g 32 + ( c 32 h 32 ) ) = 1.3870 , φ 45 = 1 2 c 32 h 32 = 0.8315 .
Figure 4 shows a data flow graph of a synthesized algorithm for the 32-point DFT. As can be seen, in this case, the algorithm takes 36 multiplications and 244 additions.

5. Conclusions

In this article, we show for the first time a simple, clear and unified approach to the derivation of fast Winograd-like DFT algorithms. The idea of constructing algorithms is based on the application of the method of synthesis of fast algorithms for calculating matrix–vector products described in [19]. The mathematical background for the construction of the described algorithms is the original method of hierarchical factorization of the DFT matrix, which differs from the factorization of this matrix in the case of the Cooley–Tukey FFT. The method of synthesis of algorithms is shown by the examples of the construction of these algorithms for two typical lengths of the initial data sequences: N = 4, N = 8, N = 16 and N = 32. As follows from Figure 1, the upper part of the data flow graph for N = 4, outlined by the dotted line, corresponds to the algorithm for N = 2. In turn, the upper part of the data flow graph for N = 8 (see Figure 2), circled with a dotted line, corresponds to the algorithm for N = 4. The upper part of the data flow graph for N = 16 (Figure 3), circled by a dotted line, corresponds to the algorithm for N = 8. Finally, the upper part of the data flow graph for N = 32 (Figure 4), circled with a dotted line, corresponds to the algorithm for N = 16. It is easy to verify that algorithms for other lengths of sequences that are powers of two can be synthesized in a similar way. Therefore, the described method can be considered as universal.
The advantage of the presented algorithms in comparison with the Cooley–Tukey algorithms is that the critical path in the graph of any of the obtained algorithms contains only one multiplication. If there is more than one multiplication in the critical path of the algorithm, then this will create additional problems for the implementation of computations. As a result of multiplying two n-bit operands, a 2n-bit product is obtained. The need for repeated multiplication requires an additional amount of manipulations with the operands and therefore requires more time and effort than when we are dealing with only a single multiplication. In fixed-point devices, this fact can cause overflow–underflow handling. If we want to preserve the accuracy, then double access to the memory is required both when writing and when reading. Using floating-point arithmetic in this case also creates additional problems related to exponent alignment, mantissa addition, etc. This is what we had in mind when we wrote about additional dignity. Another important advantage of these algorithms over the Cooley–Tukey algorithms is that the multiplications here are either purely real or purely imaginary. Multiplying complex numbers requires three multiplications of real numbers, while multiplying a complex number by a real number requires only two real multiplications. It leads to an additional reduction in the multiplicative complexity of computations. These two advantages are typical of all Winograd-type algorithms.

Author Contributions

Conceptualization, A.C.; methodology, A.C. and M.R.; validation, A.C. and M.R.; formal analysis, A.C. and M.R.; investigation, M.R.; writing—original draft, A.C. and M.R.; writing—review and editing, M.R.; supervision, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank Dorota Majorkowska-Mech for advice and guidance on how to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. McClellan, J.H.; Rader, C.M. Number Theory in Digital Signal Processing; Prentice-Hall Signal Processing Series; Prentice-Hall: Englewood Cliffs, NJ, USA, 1979. [Google Scholar]
  2. Douglas, E.; Rao, K.R. Fast Transforms Algorithms, Analyses, Applications; Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
  3. Elliott, D.F. (Ed.) Handbook of Digital Signal Processing: Engineering Applications; Academic Press: San Diego, CA, USA, 1987. [Google Scholar]
  4. Nussbaumer, H.J. Fast Fourier Transform and Convolution Algorithms; Springer Series in Information Sciences; Springer: Berlin/Heidelberg, Germany, 1982; Volume 2. [Google Scholar] [CrossRef]
  5. Burrus, C.S.; Parks, T.W.; Potts, J.F. DFT/FFT and Convolution Algorithms: Theory and Implementation; Topics in Digital Signal Processing; Wiley: New York, NY, USA, 1984. [Google Scholar]
  6. Blahut, R.E. Fast Algorithms for Digital Signal Processing; Addison-Wesley Pub. Co.: Reading, MA, USA, 1985. [Google Scholar]
  7. Tolimieri, R.; An, M.; Lu, C. Algorithms for Discrete Fourier Transform and Convolution; Signal Processing and Digital Filtering; Springer: New York, NY, USA, 1989. [Google Scholar] [CrossRef]
  8. Smith, W.W.; Smith, J.M. Handbook of Real-Time Fast Fourier Transforms: Algorithms to Product Testing; IEEE Press: New York, NY, USA, 1995. [Google Scholar]
  9. Parhi, K.K. VLSI Digital Signal Processing Systems: Design and implementation; Wiley: New York, NY, USA, 1999. [Google Scholar]
  10. Moon, T.K.; Stirling, W.C. Mathematical Methods and Algorithms for Signal Processing; Prentice Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
  11. Garg, H.K. Digital Signal Processing Algorithms: Number Theory, Convolution, Fast Fourier Transforms, and Applications; Press Computer Engineering Series; CRC Press: Boca Raton, FL, USA, 1998. [Google Scholar]
  12. Bi, G.; Zeng, Y. Transforms and Fast Algorithms for Signal Analysis and Representations; Birkhäuser Boston: Boston, MA, USA, 2004. [Google Scholar] [CrossRef]
  13. Burrus, C.S.; Selesnick, I. Winograd’s Short DFT Algorithms; OpenStax CNX, Rice University: Houston, TX, USA; Available online: http://cnx.org/content/m16333/latest (accessed on 15 March 2022).
  14. Winograd, S. On computing the discrete Fourier transform. Math. Comput. 1978, 32, 175–199. [Google Scholar] [CrossRef]
  15. Silverman, H. An introduction to programming the Winograd Fourier transform algorithm (WFTA). IEEE Trans. Acoust. Speech Signal Process. 1977, 25, 152–165. [Google Scholar] [CrossRef]
  16. Regalia, P.A.; Sanjit, M.K. Kronecker Products, Unitary Matrices and Signal Processing Applications. SIAM Rev. 1989, 31, 586–613. [Google Scholar] [CrossRef]
  17. Steeb, W.H.; Hardy, Y. Matrix Calculus and Kronecker Product: A Practical Approach to Linear and Multilinear Algebra, 2nd ed.; World Scientific: Singapore, 2011. [Google Scholar] [CrossRef]
  18. Beck, A.; Tetruashvili, L. On the Convergence of Block Coordinate Descent Type Methods. SIAM J. Optim. 2013, 23, 2037–2060. [Google Scholar] [CrossRef]
  19. Cariow, A. Strategies for the Synthesis of Fast Algorithms for the Computation of the Matrix-vector Products. J. Signal Process. Theory Appl. 2014. [Google Scholar] [CrossRef]
  20. Andreatto, B.; Cariow, A. Automatic generation of fast algorithms for matrix–vector multiplication. Int. J. Comput. Math. 2018, 95, 626–644. [Google Scholar] [CrossRef]
Figure 1. The data flow graph of the proposed algorithm for computation of 4-point DFT.
Figure 1. The data flow graph of the proposed algorithm for computation of 4-point DFT.
Electronics 11 01342 g001
Figure 2. The data flow graph of the proposed algorithm for computation of 8-point DFT.
Figure 2. The data flow graph of the proposed algorithm for computation of 8-point DFT.
Electronics 11 01342 g002
Figure 3. The data flow graph of the proposed algorithm for computation of 16-point DFT.
Figure 3. The data flow graph of the proposed algorithm for computation of 16-point DFT.
Electronics 11 01342 g003
Figure 4. The data flow graph of the proposed algorithm for computation of 32-point DFT.
Figure 4. The data flow graph of the proposed algorithm for computation of 32-point DFT.
Electronics 11 01342 g004
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Raciborski, M.; Cariow, A. On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two. Electronics 2022, 11, 1342. https://doi.org/10.3390/electronics11091342

AMA Style

Raciborski M, Cariow A. On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two. Electronics. 2022; 11(9):1342. https://doi.org/10.3390/electronics11091342

Chicago/Turabian Style

Raciborski, Mateusz, and Aleksandr Cariow. 2022. "On the Derivation of Winograd-Type DFT Algorithms for Input Sequences Whose Length Is a Power of Two" Electronics 11, no. 9: 1342. https://doi.org/10.3390/electronics11091342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop