On-The-Fly Static Analysis via Dynamic Bidirected Dyck Reachability

Dyck reachability is a principled, graph-based formulation of a plethora of static analyses. Bidirected graphs are used for capturing dataflow through mutable heap data, and are usual formalisms of demand-driven points-to and alias analyses. The best (offline) algorithm runs in $O(m+n\cdot \alpha(n))$ time, where $n$ is the number of nodes and $m$ is the number of edges in the flow graph, which becomes $O(n^2)$ in the worst case. In the everyday practice of program analysis, the analyzed code is subject to continuous change, with source code being added and removed. On-the-fly static analysis under such continuous updates gives rise to dynamic Dyck reachability, where reachability queries run on a dynamically changing graph, following program updates. Naturally, executing the offline algorithm in this online setting is inadequate, as the time required to process a single update is prohibitively large. In this work we develop a novel dynamic algorithm for bidirected Dyck reachability that has $O(n\cdot \alpha(n))$ worst-case performance per update, thus beating the $O(n^2)$ bound, and is also optimal in certain settings. We also implement our algorithm and evaluate its performance on on-the-fly data-dependence and alias analyses, and compare it with two best known alternatives, namely (i) the optimal offline algorithm, and (ii) a fully dynamic Datalog solver. Our experiments show that our dynamic algorithm is consistently, and by far, the top performing algorithm, exhibiting speedups in the order of 1000X. The running time of each update is almost always unnoticeable to the human eye, making it ideal for the on-the-fly analysis setting.

翻译：Dyck可达性是一种基于图论的原则性框架，广泛用于描述多种静态分析。双向图常用于捕获可变堆数据的数流，是需求驱动指针分析和别名分析的常用形式化工具。最优离线算法的时间复杂度为$O(m+n\cdot \alpha(n))$，其中$n$为流图中的节点数，$m$为边数，最坏情况下复杂度为$O(n^2)$。在程序分析的日常实践中，被分析代码会持续变化（源代码的增删）。这种连续更新下的在线静态分析催生了动态Dyck可达性问题——即程序更新后，可达性查询需在动态变化的图上执行。显然，在此在线场景中直接使用离线算法并不合适，因为处理单次更新所需时间过长。本文针对双向Dyck可达性提出了一种新颖的动态算法，每次更新的最坏时间复杂度为$O(n\cdot \alpha(n))$，突破了$O(n^2)$的界限，并在特定场景下达到最优。我们还实现了该算法，并在在线数据依赖分析和别名分析中评估其性能，与两种已知最优方案进行了对比：（i）最优离线算法，以及（ii）全动态Datalog求解器。实验表明，我们的动态算法始终以显著优势保持最佳性能，加速比达1000倍量级。每次更新的运行时间几乎无法被人类感知，使其成为在线分析场景的理想选择。