Apart from forming the backbone of compiler optimization, static dataflow analysis has been widely applied in a vast variety of applications, such as bug detection, privacy analysis, program comprehension, etc. Despite its importance, performing interprocedural dataflow analysis on large-scale programs is well known to be challenging. In this paper, we propose a novel distributed analysis framework supporting the general interprocedural dataflow analysis. Inspired by large-scale graph processing, we devise dedicated distributed worklist algorithms for both whole-program analysis and incremental analysis. We implement these algorithms and develop a distributed framework called BigDataflow running on a large-scale cluster. The experimental results validate the promising performance of BigDataflow -- BigDataflow can finish analyzing the program of millions lines of code in minutes. Compared with the state-of-the-art, BigDataflow achieves much more analysis efficiency.
翻译:静态数据流分析不仅是编译器优化的核心基础,更被广泛应用于缺陷检测、隐私分析、程序理解等诸多领域。尽管其重要性不言而喻,对大规模程序执行过程间数据流分析仍被公认为具有挑战性。本文提出了一种支持通用过程间数据流分析的新型分布式分析框架。受大规模图处理技术启发,我们分别针对全程序分析与增量分析设计了专用的分布式工作列表算法。我们实现了这些算法,并开发了名为BigDataflow的分布式框架,该框架可运行于大规模集群之上。实验结果验证了BigDataflow的卓越性能——该框架能在数分钟内完成对百万行级别代码的程序分析。与现有前沿技术相比,BigDataflow实现了显著更高的分析效率。