Accurate Residues for Floating-Point Debugging

Floating-point arithmetic is error-prone and unintuitive. Floating-point debuggers instrument programs to monitor floating-point arithmetic at run time and flag numerical issues. They estimate residues, i.e., the difference between actual floating-point and ideal real values, for every floating-point value in the program. Prior work explores various approaches for computing these residues accurately and efficiently. Unfortunately, the most efficient methods, based on "error-free transformations", have a high rate of false reports, while the most accurate methods, based on high-precision arithmetic, are very slow. This paper builds on error-free-transformations-based approaches and aims to improve their accuracy while preserving efficiency. To more accurately compute residues, this paper divides residue computation into two steps (rounding error computation and residue function evaluation) and shows how to perform each step accurately via careful improvements to the current state of the art. We evaluate on 44 large scientific computing workloads, focusing on the 14 benchmarks where prior tools produce false reports: our approach eliminates false reports on 10 benchmarks and substantially reduces them on the remaining 3 benchmarks. Moreover, complex numerical issues require additional care due to absorption, where two machine-precision residues cannot both be computed accurately in a single execution. This paper introduces residue override, which re-executes the program multiple times, computing different residues in different executions and assembling a final "patchwork" execution. We evaluate on 169 standard benchmarks drawn from numerical analysis papers and textbooks, requiring only 3.6 re-executions on average. Among 34 benchmarks with false reports in the initial run, residue override is triggered on 29 of them and reduces false reports on 25 of them, averaging 7.1 re-executions.

翻译：浮点运算容易出错且不直观。浮点调试器通过插桩程序来监控运行时浮点运算，并标记数值问题。它们会估算程序中每个浮点值的残差，即实际浮点值与理想实数值之间的差异。先前的工作探索了多种准确高效计算这些残差的方法。遗憾的是，基于"无误差变换"的高效方法误报率很高，而基于高精度算术的最准确方法则速度非常慢。本文在无误差变换方法的基础上，旨在保持效率的同时提高其准确性。为了更准确地计算残差，本文将残差计算分为两个步骤（舍入误差计算和残差函数评估），并展示了如何通过对当前最优方法进行精细改进来准确执行每个步骤。我们在44个大型科学计算负载上进行了评估，重点关注先前工具会产生误报的14个基准测试：我们的方法消除了10个基准测试中的误报，并在其余3个基准测试中大幅减少了误报。此外，由于吸收效应（即一次执行中无法同时准确计算两个机器精度残差），复杂的数值问题需要额外处理。本文引入了残差覆盖机制，即多次重新执行程序，在不同执行中计算不同残差，并组合出最终的"拼凑"执行结果。我们在从数值分析论文和教科书中选取的169个标准基准测试上进行了评估，平均仅需3.6次重新执行。在初始运行出现误报的34个基准测试中，有29个触发了残差覆盖机制，其中25个减少了误报，平均需要7.1次重新执行。