CIll: CTI-Guided Invariant Generation via LLMs for Model Checking

Inductive invariants are crucial in model checking, yet generating effective inductive invariants automatically and efficiently remains challenging. A common approach is to iteratively analyze counterexamples to induction (CTIs) and derive invariants that rule them out, as in IC3. However, IC3's clause-based learning is limited to a CNF representation. For some designs, the resulting invariants may require a large number of clauses, which hurts scalability. We present CIll, a CTI-guided framework that leverages LLMs to synthesize invariants for model checking. CIll alternates between (bounded) correctness checking and inductiveness checking for the generated invariants. In correctness checking, CIll uses BMC to validate whether the generated invariants hold on reachable states within a given bound. In inductiveness checking, CIll checks whether the generated invariants, together with the target property, become inductive under the accumulated strengthening. When inductiveness fails, CIll extracts CTIs and provides them to the LLM. The LLM inspects the design and the CTI to propose new invariants that invalidate the CTIs. The proposed invariants are then re-validated through correctness and inductiveness checks, and the loop continues until the original property strengthened by the generated invariants becomes inductive. CIll also employs IC3 to work with the LLM for automatically discovering invariants, and uses K-Induction as a complementary engine. To improve performance, CIll applies local proof and reuses invariants learned by IC3, reducing redundant search and accelerating convergence. In our evaluation, CIll proved full compliance within RISCV-Formal framework and full accuracy of all non-M instructions in NERV and PicoRV32, whereas M extensions are proved against the RVFI ALTOPS substitute semantics provided by RISCV-Formal.

翻译：归纳不变式在模型检验中至关重要，然而自动且高效地生成有效的归纳不变式仍然具有挑战性。一种常见方法是迭代分析归纳反例（CTI）并推导出排除这些反例的不变式，如IC3算法所示。然而，IC3基于子句的学习机制受限于合取范式（CNF）表示。对于某些设计，生成的不变式可能需要大量子句，从而损害可扩展性。我们提出了CIll，一个利用大语言模型（LLM）为模型检验合成不变式的CTI引导框架。CIll在生成不变式的（有界）正确性检查与归纳性检查之间交替进行。在正确性检查中，CIll使用有界模型检验（BMC）来验证生成的不变式在给定界限内是否在可达状态上成立。在归纳性检查中，CIll检查生成的不变式与目标性质一起，在累积的强化下是否变得具有归纳性。当归纳性检查失败时，CIll提取CTI并将其提供给LLM。LLM通过分析设计与CTI，提出能够使这些CTI失效的新不变式。随后，所提出的不变式会经过正确性与归纳性的重新验证，该循环持续进行，直到由生成不变式强化后的原始性质变得具有归纳性。CIll还结合IC3与LLM协同工作以自动发现不变式，并采用K归纳法作为补充引擎。为提升性能，CIll应用局部证明并重用IC3学习到的不变式，从而减少冗余搜索并加速收敛。在我们的评估中，CIll在RISCV-Formal框架内证明了完全符合性，并在NERV和PicoRV32中证明了所有非M指令的完全正确性，而M扩展则针对RISCV-Formal提供的RVFI ALTOPS替代语义进行了验证。