Reasoning about floating-point arithmetic is notoriously hard. While static and dynamic analysis techniques or program repair have made significant progress, more work is still needed to make them relevant to real-world code. On the critical path to that goal is understanding what real-world floating-point code looks like. To close that knowledge gap, this paper presents the first large-scale empirical study of floating-point arithmetic usage across public GitHub repositories. We focus on statically typed languages to allow our study to scale to millions of repositories. We follow state-of the art mining practices including random sampling and filtering based on only intrinsic properties to avoid bias, and identify floating-point usage by searching for keywords in the source code, and programming language constructs (e.g., loops) by parsing the code. Our evaluation supports the claim often made in papers that floating-point arithmetic is widely used. Comparing statistics such as size and usage of certain constructs and functions, we find that benchmarks used in literature to evaluate automated reasoning techniques for floating-point arithmetic are in certain aspects representative of 'real-world' code, but not in all. We publish a dataset of 10 million real-world floating-point functions extracted from our study. We demonstrate in a case study how it may be used to identify new floating-point benchmarks and help future techniques for floating-point arithmetic to be designed and evaluated to match actual users' expectations.
翻译:浮点运算的推理分析向来以困难著称。尽管静态与动态分析技术及程序修复已取得显著进展,但要使这些技术真正适用于实际代码,仍需更多研究工作。实现该目标的关键路径在于理解真实世界浮点代码的实际形态。为填补这一知识空白,本文首次对GitHub公共代码库中的浮点运算使用情况进行了大规模实证研究。我们聚焦于静态类型语言,使研究能够扩展至数百万个代码库。研究采用前沿的挖掘方法,包括基于内在属性的随机抽样与过滤以避免偏差,通过源代码关键词搜索识别浮点运算使用情况,并通过代码解析识别编程语言结构(如循环)。我们的评估结果支持了学术文献中常提出的观点:浮点运算被广泛使用。通过比较代码规模、特定结构和函数使用情况等统计数据,我们发现文献中用于评估浮点运算自动推理技术的基准测试在某些方面能代表"真实世界"代码,但并非所有方面。我们发布了从研究中提取的1000万个真实世界浮点函数数据集,并通过案例研究展示了如何利用该数据集识别新的浮点运算基准,帮助未来浮点运算技术的设计与评估更符合实际用户的预期。