Algorithms for numerical tasks in finite precision simultaneously seek to minimize the number of floating point operations performed, and also the number of bits of precision required by each floating point operation. This paper presents an algorithm for Hermitian diagonalization requiring only $\lg(1/\varepsilon)+O(\log(n)+\log\log(1/\varepsilon))$ bits of precision where $n$ is the size of the input matrix and $\varepsilon$ is the target error. Furthermore, it runs in near matrix multiplication time. In the general setting, the first complete analysis of the stability of a near matrix multiplication time algorithm for diagonalization is that of Banks et al. [BGVKS20]. They exhibit an algorithm for diagonalizing an arbitrary matrix up to $\varepsilon$ backward error using only $O(\log^4(n/\varepsilon)\log(n))$ bits of precision. This work focuses on the Hermitian setting, where we determine a dramatically improved bound on the number of bits needed. In particular, the result is close to providing a practical bound. The exact bit count depends on the specific implementation of matrix multiplication and QR decomposition one wishes to use, but if one uses suitable $O(n^3)$-time implementations, then for $\varepsilon=10^{-15},n=4000$, we show 92 bits of precision suffice (and 59 are necessary). By comparison, the same parameters in [BGVKS20] does not even show that 682,916,525,000 bits suffice.
翻译:有限精度数值计算算法通常同时追求最小化浮点运算次数以及每次浮点运算所需的精度位数。本文提出一种厄米特对角化算法,仅需 $\lg(1/\varepsilon)+O(\log(n)+\log\log(1/\varepsilon))$ 位精度,其中 $n$ 为输入矩阵维度,$\varepsilon$ 为目标误差。该算法运行时间接近矩阵乘法时间复杂度。在通用场景下,Banks 等人 [BGVKS20] 首次完整分析了近矩阵乘法时间对角化算法的数值稳定性,他们提出的算法能以 $O(\log^4(n/\varepsilon)\log(n))$ 位精度实现任意矩阵的 $\varepsilon$ 后向误差对角化。本研究聚焦于厄米特矩阵场景,显著改进了所需精度位数的理论界。特别地,该结果接近提供实用精度界限。具体位数取决于所采用的矩阵乘法与 QR 分解实现方式,若采用合适的 $O(n^3)$ 时间实现,当 $\varepsilon=10^{-15},n=4000$ 时,我们证明 92 位精度已足够(59 位为必要精度)。作为对比,相同参数下 [BGVKS20] 甚至未能证明 682,916,525,000 位精度足够。