HardCore Generation: Generating Hard UNSAT Problems for Data Augmentation

Efficiently determining the satisfiability of a boolean equation -- known as the SAT problem for brevity -- is crucial in various industrial problems. Recently, the advent of deep learning methods has introduced significant potential for enhancing SAT solving. However, a major barrier to the advancement of this field has been the scarcity of large, realistic datasets. The majority of current public datasets are either randomly generated or extremely limited, containing only a few examples from unrelated problem families. These datasets are inadequate for meaningful training of deep learning methods. In light of this, researchers have started exploring generative techniques to create data that more accurately reflect SAT problems encountered in practical situations. These methods have so far suffered from either the inability to produce challenging SAT problems or time-scalability obstacles. In this paper we address both by identifying and manipulating the key contributors to a problem's ``hardness'', known as cores. Although some previous work has addressed cores, the time costs are unacceptably high due to the expense of traditional heuristic core detection techniques. We introduce a fast core detection procedure that uses a graph neural network. Our empirical results demonstrate that we can efficiently generate problems that remain hard to solve and retain key attributes of the original example problems. We show via experiment that the generated synthetic SAT problems can be used in a data augmentation setting to provide improved prediction of solver runtimes.

翻译：高效判定布尔方程的可满足性——简称为SAT问题——在各类工业问题中至关重要。近年来，深度学习方法的发展为提升SAT求解能力带来了显著潜力。然而，该领域发展的一个主要障碍是缺乏大规模、真实的数据集。当前大多数公开数据集要么是随机生成的，要么极其有限，仅包含来自无关问题族的少数示例。这些数据集不足以对深度学习方法进行有意义的训练。鉴于此，研究者已开始探索生成技术，以创建更准确反映实际场景中遇到的SAT问题的数据。这些方法迄今存在以下缺陷：要么无法生成具有挑战性的SAT问题，要么面临时间可扩展性障碍。本文通过识别并操纵影响问题“难度”的关键因素——即核心（cores）——来同时解决这两个问题。尽管已有一些先前工作涉及核心，但由于传统启发式核心检测技术成本高昂，其时间开销令人难以接受。我们引入了一种使用图神经网络的快速核心检测流程。实证结果表明，我们能够高效生成保持求解难度并保留原始示例问题关键属性的问题。我们通过实验证明，生成的合成SAT问题可用于数据增强场景，从而改进对求解器运行时间的预测。

相关内容

SAT

关注 0

SAT是研究者关注命题可满足性问题的理论与应用的第一次年度会议。除了简单命题可满足性外，它还包括布尔优化（如MaxSAT和伪布尔（PB）约束）、量化布尔公式（QBF）、可满足性模理论（SMT）和约束规划（CP），用于与布尔级推理有明确联系的问题。官网链接：http://sat2019.tecnico.ulisboa.pt/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日