The Fast Reciprocal Square Root Algorithm is a well-established approximation technique consisting of two stages: first, a coarse approximation is obtained by manipulating the bit pattern of the floating point argument using integer instructions, and second, the coarse result is refined through one or more steps, traditionally using Newtonian iteration but alternatively using improved expressions with carefully chosen numerical constants found by other authors. The algorithm was widely used before microprocessors carried built-in hardware support for computing reciprocal square roots. At the time of writing, however, there is in general no hardware acceleration for computing other fixed fractional powers. This paper generalises the algorithm to cater to all rational powers, and to support any polynomial degree(s) in the refinement step(s), and under the assumption of unlimited floating point precision provides a procedure which automatically constructs provably optimal constants in all of these cases. It is also shown that, under certain assumptions, the use of monic refinement polynomials yields results which are much better placed with respect to the cost/accuracy tradeoff than those obtained using general polynomials. Further extensions are also analysed, and several new best approximations are given.
翻译:快速倒数平方根算法是一种成熟的近似技术,包含两个阶段:首先,通过使用整数指令操纵浮点参数的位模式获得粗略近似值;其次,通过一个或多个步骤对粗略结果进行精化——传统上采用牛顿迭代法,但也可使用由其他研究者发现的、带有精心选择的数值常数的改进表达式。在微处理器内置硬件支持计算倒数平方根之前,该算法被广泛使用。然而,截至写作时,通常没有硬件加速机制用于计算其他固定分数次幂。本文对该算法进行推广,使其适用于所有有理次幂,并支持精化步骤中的任意多项式次数。在假设无限浮点精度的条件下,本文提供了一种自动构造这些情形下可证明最优常数的程序。同时证明,在某些假设下,使用首一精化多项式所获得的结果在成本/精度权衡方面显著优于使用一般多项式的结果。此外,本文还分析了进一步的扩展,并给出了若干新的最优逼近值。