Understanding CDCL Solvers via Scalability Studies and Proofdoors

Over the past several decades, CDCL SAT solvers have proven remarkably effective on large industrial formulas, despite SAT being NP-complete and widely believed to be intractable. While considerable empirical research has been done on solver performance over benchmarks like the SAT competition, as well as scaling studies on random and crafted families, surprisingly little effort has gone into systematic scaling studies over industrial instances. To address this gap, we collect a large benchmark of Bounded Model Checking (BMC) instances (76,600+ across 766 families) and perform a systematic scaling study of solver performance. We observe a spectrum: some families scale linearly, others polynomially or exponentially. Building on this foundation, we study the structural parameters that have been proposed to explain this phenomenon. We first show that previously proposed parameters -- clause-variable ratio, treewidth, and community structure -- fail to discriminate between the linear and exponential regimes. By contrast, the recently proposed \emph{proofdoor} parameter explains this phenomenon well. Informally, a proofdoor is a sequence of interpolants between chunks of a formula, where each interpolant represents the solver's memoization of reasoning effort on chunks it has already analyzed. In support of the proofdoor hypothesis, we make three key contributions. First, we empirically show that CDCL solvers do compute small proofdoors for linearly-scaling BMC instances. Second, we show that for exponentially-scaling instances, sampled proofdoors scale exponentially and are typically not incrementally absorbed. Third, we show that scrambling linearly-scaling instances yields larger proofdoor sizes relative to pre-scrambling, relating poor branching order to larger proofdoor sizes and drop in solver performance.

翻译：过去几十年间，CDCL SAT求解器在处理大型工业公式方面表现出色，尽管SAT问题是NP完全问题且被普遍认为是难解的。虽然已有大量关于求解器在SAT竞赛等基准测试上性能的实证研究，以及对随机和构造系列的扩展性研究，但针对工业实例的系统性扩展研究却出人意料地少。为填补这一空白，我们收集了包含766个系列、超过76600个实例的界限模型检测（BMC）大规模基准测试，并对求解器性能进行了系统性扩展研究。我们观察到不同系列呈现出不同的扩展行为：部分呈线性扩展，部分呈多项式或指数级扩展。基于此，我们研究了用于解释该现象的结构参数。首先，我们证明先前提出的参数——子句变量比、树宽和社区结构——无法区分线性扩展与指数扩展。相比之下，近期提出的"证明门"（proofdoor）参数能很好地解释这一现象。非形式化地说，证明门是公式片段间的一系列插值序列，每个插值代表求解器对已分析片段推理过程的记忆化支持。我们通过三个关键贡献验证了证明门假设：第一，实证表明CDCL求解器确实能为线性扩展的BMC实例计算小型证明门；第二，对于指数扩展实例，采样得到的证明门呈指数级增长且通常无法渐进吸收；第三，扰乱线性扩展实例会导致证明门尺寸相较于原始实例显著增大，这揭示了劣质分支顺序与证明门尺寸增大及求解器性能下降之间的关联。