Efficiently determining the satisfiability of a boolean equation -- known as the SAT problem for brevity -- is crucial in various industrial problems. Recently, the advent of deep learning methods has introduced significant potential for enhancing SAT solving. However, a major barrier to the advancement of this field has been the scarcity of large, realistic datasets. The majority of current public datasets are either randomly generated or extremely limited, containing only a few examples from unrelated problem families. These datasets are inadequate for meaningful training of deep learning methods. In light of this, researchers have started exploring generative techniques to create data that more accurately reflect SAT problems encountered in practical situations. These methods have so far suffered from either the inability to produce challenging SAT problems or time-scalability obstacles. In this paper we address both by identifying and manipulating the key contributors to a problem's ``hardness'', known as cores. Although some previous work has addressed cores, the time costs are unacceptably high due to the expense of traditional heuristic core detection techniques. We introduce a fast core detection procedure that uses a graph neural network. Our empirical results demonstrate that we can efficiently generate problems that remain hard to solve and retain key attributes of the original example problems. We show via experiment that the generated synthetic SAT problems can be used in a data augmentation setting to provide improved prediction of solver runtimes.
翻译:高效判定布尔方程的可满足性——简称为SAT问题——在各类工业问题中至关重要。近年来,深度学习方法的发展为提升SAT求解能力带来了显著潜力。然而,该领域发展的一个主要障碍是缺乏大规模、真实的数据集。当前大多数公开数据集要么是随机生成的,要么极其有限,仅包含来自无关问题族的少数示例。这些数据集不足以对深度学习方法进行有意义的训练。鉴于此,研究者已开始探索生成技术,以创建更准确反映实际场景中遇到的SAT问题的数据。这些方法迄今存在以下缺陷:要么无法生成具有挑战性的SAT问题,要么面临时间可扩展性障碍。本文通过识别并操纵影响问题“难度”的关键因素——即核心(cores)——来同时解决这两个问题。尽管已有一些先前工作涉及核心,但由于传统启发式核心检测技术成本高昂,其时间开销令人难以接受。我们引入了一种使用图神经网络的快速核心检测流程。实证结果表明,我们能够高效生成保持求解难度并保留原始示例问题关键属性的问题。我们通过实验证明,生成的合成SAT问题可用于数据增强场景,从而改进对求解器运行时间的预测。