Reliable numerical computations are central to scientific computing, but the floating-point arithmetic that enables large-scale models is error-prone. Numeric exceptions are a common occurrence and can propagate through code, leading to flawed results. This paper presents FlowFPX, a toolkit for systematically debugging floating-point exceptions by recording their flow, coalescing exception contexts, and fuzzing in select locations. These tools help scientists discover when exceptions happen and track down their origin, smoothing the way to a reliable codebase.
翻译:可靠数值计算是科学计算的核心,但支撑大规模模型的浮点运算极易出错。数值异常频繁发生,并可能在代码中传播,最终导致错误结果。本文提出FlowFPX工具包,通过记录异常流动路径、聚合异常上下文以及选择性位置模糊测试,系统性地调试浮点异常。这些工具可帮助科研人员发现异常发生时机并追溯其根源,为构建可靠代码库铺平道路。