Numerical software depends on fast, accurate implementations of mathematical primitives like sin, exp, and log. Modern superoptimizers can optimize floating-point kernels against a given set of such primitives, but a more fundamental question remains open: which new primitives are worth implementing in the first place? We formulate this as numerical library learning: given a workload of floating-point kernels, identify the mathematical primitives whose expert implementations would most improve speed and accuracy. Our key insight is that numerical superoptimizers already have the machinery well-suited to this problem. Their search procedures happen to enumerate candidate primitives, their equivalence procedures can generalize and deduplicate candidates, and their cost models can estimate counterfactual utility: how much the workload would improve if a given primitive were available. We present GrowLibm, which repurposes the Herbie superoptimizer as a numerical library learner. GrowLibm mines candidate primitives from the superoptimizer's intermediate search results, ranks them by counterfactual utility, and prunes redundant candidates. Across three scientific applications (PROJ, CoolProp, and Basilisk), GrowLibm identifies compact, reusable primitives that can be implemented effectively using standard numerical techniques. When Herbie is extended with these expert implementations, kernel speed improves by up to 2.2x at fixed accuracy, and maximum achievable accuracy also improves, in one case from 56.0% to 93.5%. We also prototype an LLVM matcher that recognizes learned primitives in optimized IR, recovering 26 replacement sites across five PROJ projections and improving end-to-end application performance by up to 5%.
翻译:数值软件依赖于数学原语(如sin、exp和log)的快速、精确实现。现代超优化器能够针对给定的一组此类原语优化浮点内核,但一个更根本的问题仍然悬而未决:哪些新原语值得首先实现?我们将此问题形式化为数值库学习:给定一组浮点内核工作负载,识别出那些专家级实现能够最大程度提升速度和精度的数学原语。我们的关键洞察在于,数值超优化器已具备非常适合解决此问题的机制:其搜索过程恰好能枚举候选原语,其等价性检查过程能泛化并去重候选,而其成本模型能估计反事实效用——即若给定原语可用,工作负载将提升多少。我们提出了GrowLibm,它将Herbie超优化器改造为数值库学习器。GrowLibm从超优化器的中间搜索结果中挖掘候选原语,按反事实效用排序,并修剪冗余候选。在三个科学应用(PROJ、CoolProp和Basilisk)中,GrowLibm识别出紧凑、可复用的原语,这些原语可通过标准数值技术有效实现。当用这些专家实现扩展Herbie时,在固定精度下内核速度提升可达2.2倍,最大可达精度也得到提升,其中一个案例从56.0%提升至93.5%。我们还原型实现了一个LLVM匹配器,用于在优化后的中间表示中识别已学习的原语,在五个PROJ投影中恢复出26个替换点,并将端到端应用性能提升高达5%。