We give algorithms with lower arithmetic operation counts for both the Walsh-Hadamard Transform (WHT) and the Discrete Fourier Transform (DFT) on inputs of power-of-2 size $N$. For the WHT, our new algorithm has an operation count of $\frac{23}{24}N \log N + O(N)$. To our knowledge, this gives the first improvement on the $N \log N$ operation count of the simple, folklore Fast Walsh-Hadamard Transform algorithm. For the DFT, our new FFT algorithm uses $\frac{15}{4}N \log N + O(N)$ real arithmetic operations. Our leading constant $\frac{15}{4} = 3.75$ improves on the leading constant of $5$ from the Cooley-Tukey algorithm from 1965, leading constant $4$ from the split-radix algorithm of Yavne from 1968, leading constant $\frac{34}{9}=3.777\ldots$ from a modification of the split-radix algorithm by Van Buskirk from 2004, and leading constant $3.76875$ from a theoretically optimized version of Van Buskirk's algorithm by Sergeev from 2017. Our new WHT algorithm takes advantage of a recent line of work on the non-rigidity of the WHT: we decompose the WHT matrix as the sum of a low-rank matrix and a sparse matrix, and then analyze the structures of these matrices to achieve a lower operation count. Our new DFT algorithm comes from a novel reduction, showing that parts of the previous best FFT algorithms can be replaced by calls to an algorithm for the WHT. Replacing the folklore WHT algorithm with our new improved algorithm leads to our improved FFT.
翻译:针对规模为2的幂次方($N$)的输入,我们提出了算术运算次数更低的沃尔什-哈达玛变换(WHT)与离散傅里叶变换(DFT)算法。对于WHT,新算法的运算次数为$\frac{23}{24}N \log N + O(N)$。据我们所知,这是首次在经典快速沃尔什-哈达玛变换算法(采用$N \log N$次运算的简易大众化方法)上实现改进。对于DFT,新提出的快速傅里叶变换(FFT)算法仅需$\frac{15}{4}N \log N + O(N)$次实数算术运算。其中领先常数$\frac{15}{4}=3.75$,优于1965年Cooley-Tukey算法的常数5、1968年Yavne分裂基算法的常数4、2004年Van Buskirk改进分裂基算法的常数$\frac{34}{9}=3.777\ldots$,以及2017年Sergeev从理论上优化的Van Buskirk算法常数3.76875。新WHT算法利用了近期关于WHT矩阵非刚性的研究成果:将WHT矩阵分解为低秩矩阵与稀疏矩阵之和,通过分析这些矩阵的结构实现更低的运算次数。新DFT算法则源于一种创新性简化——证明此前最佳FFT算法的部分环节可被WHT算法调用所替代,用我们改进后的WHT算法替换经典方法即得到更优的FFT。