The fused multiply-add (FMA) instruction enables the radix-2 FFT butterfly to be computed in 6~FMA operations -- the proven minimum. The classical factorization by Linzer and Feig~\cite{linzer1993} precomputes the ratio $\cotθ= \cosθ/\sinθ$, which is singular when the twiddle factor is $W^0 = 1$ (i.e., $\sinθ= 0$). Standard practice clamps $\sinθ$ to a small epsilon, degrading numerical precision. We observe that an alternative factorization using $\cosθ$ as the outer multiplier (precomputing $\tanθ$) avoids this particular singularity but introduces a new one at $W^{N/4}$. We then propose a \emph{dual-select} strategy that chooses, per twiddle factor, whichever factorization yields $|\text{ratio}| \leq 1$. This eliminates all singularities, requires no epsilon clamping, and bounds the precomputed ratio to unity for all twiddle factors. For $N = 1024$, the worst-case ratio drops from 163 (Linzer-Feig) to exactly~1.0 (dual-select), yielding a $235\times$ tighter error bound in FP16 arithmetic over 10~FFT passes. The strategy adds zero computational overhead -- only the precomputed twiddle table changes.
翻译:融合乘法-加法(FMA)指令使得基-2 FFT蝶形运算仅需6次FMA操作即可完成——这已被证明是最优计算量。Linzer与Feig~\cite{linzer1993}的经典分解方法预计算比率$\cotθ= \cosθ/\sinθ$,当旋转因子为$W^0 = 1$(即$\sinθ= 0$)时存在奇异性。标准处理方法将$\sinθ$截断为微小量epsilon,但会降低数值精度。我们观察到,采用$\cosθ$作为外乘数(预计算$\tanθ$)的替代分解可避免该特定奇异性,但会在$W^{N/4}$处引入新奇异性。为此,我们提出了一种\emph{双选择}策略:对每个旋转因子,选择使得$|\text{ratio}| \leq 1$的分解方式。该方法消除了所有奇异性,无需epsilon截断,并将所有旋转因子的预计算比率限制在单位值以内。当$N=1024$时,最差情况比率从163(Linzer-Feig方法)降至精确1.0(双选择方法),在FP16算术下经过10次FFT传递后,误差界可收紧$235\times$。该策略不增加任何计算开销——仅需修改预计算旋转因子表。