Data-Driven Template-Free Invariant Generation

Automatic verification of concurrent programs faces state explosion due to the exponential possible interleavings of its sequential components coupled with large or infinite state spaces. An alternative is deductive verification, where given a candidate invariant, we establish inductive invariance and show that any state satisfying the invariant is also safe. However, learning (inductive) program invariants is difficult. To this end, we propose a data-driven procedure to synthesize program invariants, where it is assumed that the program invariant is an expression that characterizes a (hopefully tight) over-approximation of the reachable program states. The main ideas of our approach are: (1) We treat a candidate invariant as a classifier separating states observed in (sampled) program traces from those speculated to be unreachable. (2) We develop an enumerative, template-free approach to learn such classifiers from positive and negative examples. At its core, our enumerative approach employs decision trees to generate expressions that do not over-fit to the observed states (and thus generalize). (3) We employ a runtime framework to monitor program executions that may refute the candidate invariant; every refutation triggers a revision of the candidate invariant. Our runtime framework can be viewed as an instance of statistical model checking, which gives us probabilistic guarantees on the candidate invariant. We also show that such in some cases, our counterexample-guided inductive synthesis approach converges (in probability) to an overapproximation of the reachable set of states. Our experimental results show that our framework excels in learning useful invariants using only a fraction of the set of reachable states for a wide variety of concurrent programs.

翻译：并发程序的自动验证面临状态爆炸问题，这是由于顺序组件间指数级可能的交错组合，以及大规模或无限状态空间所致。替代方案是演绎验证，即给定候选不变式后，建立归纳不变性并证明满足该不变式的任何状态都是安全的。然而，学习（归纳）程序不变式具有挑战性。为此，我们提出了一种数据驱动的程序不变式合成方法，假设程序不变式是表征可达程序状态（理想情况下紧致）过近似的一个表达式。我们方法的主要思想是：（1）将候选不变式视为分类器，用于区分程序轨迹中采样的状态与推测不可达的状态。（2）开发一种枚举式无模板方法，通过正例和反例学习此类分类器。其核心采用决策树生成不过度拟合观测状态（从而具有泛化性）的表达式。（3）采用运行时框架监控可能反驳候选不变式的程序执行；每次反驳都会触发候选不变式的修订。我们的运行时框架可视为统计模型检验的实例，为候选不变式提供概率性保证。我们还表明，在某些情况下，反例引导的归纳合成方法（依概率）收敛到可达状态集的过近似。实验结果表明，我们的框架仅需利用广泛并发程序中可达状态集的极小部分，即可高效学习有用的不变式。