Cylindrical Algebraic Decomposition (CAD) is a key proof technique for formal verification of cyber-physical systems. CAD is computationally expensive, with worst-case doubly-exponential complexity. Selecting an optimal variable ordering is paramount to efficient use of CAD. Prior work has demonstrated that machine learning can be useful in determining efficient variable orderings. Much of this work has been driven by CAD problems extracted from applications of the MetiTarski theorem prover. In this paper, we revisit this prior work and consider issues of bias in existing training and test data. We observe that the classical MetiTarski benchmarks are heavily biased towards particular variable orderings. To address this, we apply symmetries to create a new dataset containing more than 41K MetiTarski challenges designed to remove bias. Furthermore, we evaluate issues of information leakage, and test the generalizability of our models on the new dataset.
翻译:柱形代数分解(CAD)是信息物理系统形式化验证的关键证明技术,其计算复杂度极高,最坏情况下呈双指数级增长。选择最优变量排序对高效使用CAD至关重要。已有研究表明,机器学习有助于确定高效的变量排序,这类研究大多基于从MetiTarski定理证明器应用中提取的CAD问题。本文重新审视了这些前期工作,并考虑现有训练与测试数据中存在的偏差问题。我们发现经典MetiTarski基准测试对特定变量排序存在严重偏向。为克服这一缺陷,我们利用对称性创建了包含超过41K个MetiTarski挑战问题的新数据集,旨在消除偏差。此外,我们还评估了信息泄露问题,并在新数据集上测试了模型的泛化能力。