Breaking the Data Barrier in Learning Symbolic Computation: A Case Study on Variable Ordering Suggestion for Cylindrical Algebraic Decomposition

Symbolic computation, powered by modern computer algebra systems, has important applications in mathematical reasoning through exact deep computations. The efficiency of symbolic computation is largely constrained by such deep computations in high dimension. This creates a fundamental barrier on labelled data acquisition if leveraging supervised deep learning to accelerate symbolic computation. Cylindrical algebraic decomposition (CAD) is a pillar symbolic computation method for reasoning with first-order logic formulas over reals with many applications in formal verification and automatic theorem proving. Variable orderings have a huge impact on its efficiency. Impeded by the difficulty to acquire abundant labelled data, existing learning-based approaches are only competitive with the best expert-based heuristics. In this work, we address this problem by designing a series of intimately connected tasks for which a large amount of annotated data can be easily obtained. We pre-train a Transformer model with these data and then fine-tune it on the datasets for CAD ordering. Experiments on publicly available CAD ordering datasets show that on average the orderings predicted by the new model are significantly better than those suggested by the best heuristic methods.

翻译：符号计算在现代计算机代数系统的驱动下，通过精确的深层计算在数学推理中具有重要应用。符号计算的效率在很大程度上受限于高维空间中的此类深层计算。若利用监督式深度学习加速符号计算，这便构成了标注数据获取的根本性障碍。圆柱代数分解（CAD）是基于实数域一阶逻辑公式推理的核心符号计算方法，在形式化验证与自动定理证明领域具有广泛应用。变量排序对其效率具有重大影响。由于难以获取充足的标注数据，现有基于学习的方法仅能与最佳专家启发式方法相竞争。本研究通过设计一系列紧密关联的任务来解决该问题，这些任务可轻松获取大量标注数据。我们利用这些数据对Transformer模型进行预训练，随后在CAD排序数据集上进行微调。在公开可用的CAD排序数据集上的实验表明，新模型预测的排序平均显著优于最佳启发式方法所建议的排序。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【2022新书】机器学习中的概率数值计算，412页pdf

专知会员服务

93+阅读 · 2022年7月7日

【经典书】计算机科学中的逻辑学:对系统的建模和推理，443页pdf

专知会员服务

40+阅读 · 2021年4月14日

「学习几何结构」，Ron Kimmel讲述新半监督学习方法，附PPT与视频

专知会员服务

24+阅读 · 2021年3月15日

【ICLR2020-Facebook 2020】深度学习符号化数学，Deep Learning for Symbolic Mathematics，

专知会员服务

23+阅读 · 2020年4月7日