High-Performance FFT Using Data Reorganization


The KIPS Transactions:PartA, Vol. 12, No. 3, pp. 215-222, Jun. 2005
10.3745/KIPSTA.2005.12.3.215,   PDF Download:

Abstract

The efficient utilization of cache memories is a key factor in achieving high performance for computing large signal transforms. Nonunit stride access in computation of large DFTs causes cache conflict misses, thereby resulting in poor cache performance. It leads to a severe degradation in overall performance. In this paper, we propose a dynamic data layout approach considering the memory hierarchy system. In our approach, data reorganization is performed between computation stages to reduce the number of cache misses. Also, we develop an efficient search algorithm to determine the optimal tree with the minimum execution time among possible factorization trees considering the size of DFTs and the data access stride. Our approach is applied to compute the fast Fourier Transform (FFT). Experiments were performed on Pentium 4, AthlonTM 64, Alpha 21264, UltraSPARC III. Experiment results show that our FFT achieve performance improvement of up to 3.37 times better than the previous FFT packages.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
N. S. Park and Y. H. Choi, "High-Performance FFT Using Data Reorganization," The KIPS Transactions:PartA, vol. 12, no. 3, pp. 215-222, 2005. DOI: 10.3745/KIPSTA.2005.12.3.215.

[ACM Style]
Neung Soo Park and Yung Ho Choi. 2005. High-Performance FFT Using Data Reorganization. The KIPS Transactions:PartA, 12, 3, (2005), 215-222. DOI: 10.3745/KIPSTA.2005.12.3.215.