A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition
Fast Fourier transform (FFT) is the kernel and the most time-consuming algorithm in the domain of digital signal processing, and the FFT sizes of different applications are very different. Therefore, this paper proposes a variable-size FFT hardware accelerator, which fully supports the IEEE-754 single-precision floating-point standard and the FFT calculation with a wide size range from 2 to 2 20 points. First, a parallel Cooley–Tukey FFT algorithm based on matrix transposition (MT) is proposed, which can efficiently divide a large size FFT into several small size FFTs that can be executed in parallel. Second, guided by this algorithm, the FFT hardware accelerator is designed, and several FFT performance optimization techniques such as hybrid twiddle factor generation, multibank data memory, block MT, and token-based task scheduling are proposed. Third, its VLSI implementation is detailed, showing that it can work at 1 GHz with the area of 2.4 mm 2 and the power consumption of 91.3 mW at 25 ?C, 0.9 V. Finally, several experiments are carried out to evaluate the proposal’s performance in terms of FFT execution time, resource utilization, and power consumption. Comparative experiments show that our FFT hardware accelerator achieves at most 18.89× speedups in comparison to two software-only solutions and two hardware dedicated solutions.
Execution time is high
Efficiently divide a large size FFT into several small size FFTs