Fast Fourier Transform (FFT) libraries are widely used for evaluating discrete convolutions. Most FFT implementations follow some variant of the Cooley-Tukey framework, in which the transform is decomposed into butterfly operations and index-reversal permutations. While butterfly operations dominate the floating-point operation count, the memory access patterns induced by index-reversal permutations significantly degrade the FFT's arithmetic intensity. When performing discrete convolution, the three sets of index-reversal permutations which occur in FFT-based implementations using Cooley-Tukey frameworks cancel out, thus paving the way to implementations free of any permutation. To the best of our knowledge, such permutation-free variants of FFT-based discrete convolution are not commonly used in practice, making such kernels worth investigating. Here, we look into such permutation-avoiding convolution procedures for multi-dimensional cases within a general radix Cooley-Tukey framework. We perform numerical experiments to benchmark the algorithms presented against state-of-the-art FFT-based convolution implementations. Our results suggest that developers of FFT libraries should consider supporting permutation-avoiding convolution kernels.
翻译:快速傅里叶变换(FFT)库广泛用于评估离散卷积。大多数FFT实现遵循Cooley-Tukey框架的某种变体,其中变换被分解为蝶形运算和索引反转置换。虽然蝶形运算主导浮点运算次数,但索引反转置换引起的内存访问模式显著降低了FFT的算术强度。在执行离散卷积时,使用Cooley-Tukey框架的FFT实现中出现的三组索引反转置换会相互抵消,从而为无任何置换的实现铺平道路。据我们所知,这种避免置换的基于FFT的离散卷积变体在实践中并不常用,因此这类核函数值得研究。本文在通用基数的Cooley-Tukey框架内,研究了多维情况下这种避免置换的卷积过程。我们进行了数值实验,将所提出的算法与最先进的基于FFT的卷积实现进行基准测试。结果表明,FFT库的开发人员应考虑支持避免置换的卷积核函数。