Symbolic computation, powered by modern computer algebra systems, has important applications in mathematical reasoning through exact deep computations. The efficiency of symbolic computation is largely constrained by such deep computations in high dimension. This creates a fundamental barrier on labelled data acquisition if leveraging supervised deep learning to accelerate symbolic computation. Cylindrical algebraic decomposition (CAD) is a pillar symbolic computation method for reasoning with first-order logic formulas over reals with many applications in formal verification and automatic theorem proving. Variable orderings have a huge impact on its efficiency. Impeded by the difficulty to acquire abundant labelled data, existing learning-based approaches are only competitive with the best expert-based heuristics. In this work, we address this problem by designing a series of intimately connected tasks for which a large amount of annotated data can be easily obtained. We pre-train a Transformer model with these data and then fine-tune it on the datasets for CAD ordering. Experiments on publicly available CAD ordering datasets show that on average the orderings predicted by the new model are significantly better than those suggested by the best heuristic methods.
翻译:符号计算在现代计算机代数系统的驱动下,通过精确的深层计算在数学推理中具有重要应用。符号计算的效率在很大程度上受限于高维空间中的此类深层计算。若利用监督式深度学习加速符号计算,这便构成了标注数据获取的根本性障碍。圆柱代数分解(CAD)是基于实数域一阶逻辑公式推理的核心符号计算方法,在形式化验证与自动定理证明领域具有广泛应用。变量排序对其效率具有重大影响。由于难以获取充足的标注数据,现有基于学习的方法仅能与最佳专家启发式方法相竞争。本研究通过设计一系列紧密关联的任务来解决该问题,这些任务可轻松获取大量标注数据。我们利用这些数据对Transformer模型进行预训练,随后在CAD排序数据集上进行微调。在公开可用的CAD排序数据集上的实验表明,新模型预测的排序平均显著优于最佳启发式方法所建议的排序。