Redundancy Is All You Need (for CSP Sparsification)

The seminal work of Benczúr and Karger demonstrated cut sparsifiers of near-linear size. Subsequent extensions have yielded sparsifiers for hypergraph cuts and more recently linear codes over Abelian groups. A decade ago, Kogan and Krauthgamer asked about the sparsifiability of arbitrary constraint satisfaction problems (CSPs). For this question, a trivial lower bound is the size of a non-redundant CSP instance, which admits, for each constraint, an assignment satisfying only that constraint (so that no constraint can be dropped by the sparsifier). For instance, for graph cuts, spanning trees are non-redundant instances. Our main result is that redundant clauses are sufficient for sparsification: for any CSP predicate R, every unweighted instance of CSP(R) has a sparsifier of size at most its non-redundancy (up to polylog and $1/ε$ factors). For weighted instances, we similarly pin down the sparsifiability to the so-called chain length of the predicate. These results precisely determine the extent to which any CSP can be sparsified. Our result is established in the general setting of non-linear codes, or equivalently set families, yielding a VC-type theorem for multiplicative error approximation. A key technical ingredient in our work is a novel application of the entropy method from Gilmer's recent breakthrough on the union-closed sets conjecture. As an immediate consequence of our main theorem, a number of results in the non-redundancy literature immediately extend to CSP sparsification. We also contribute new techniques for understanding the non-redundancy of CSP predicates. By adapting methods from the matching vector codes literature in coding theory, we are able to construct an explicit predicate whose non-redundancy lies between $Ω(n^{1.5})$ and $\widetilde{O}(n^{1.6})$, the first example with a provably non-integral exponent.

翻译：Benczúr和Karger的开创性工作展示了近线性大小的割稀疏化器。后续扩展产生了超图割的稀疏化器，以及最近阿贝尔群上线性码的稀疏化器。十年前，Kogan和Krauthgamer提出了任意约束满足问题（CSP）的可稀疏化性问题。对于该问题，一个平凡下界是非冗余CSP实例的大小，该实例允许每个约束存在一个仅满足该约束的赋值（从而稀疏化器无法删除任何约束）。例如，对于图割，生成树是非冗余实例。我们的主要结果是：冗余子句足以实现稀疏化——对于任意CSP谓词R，每个未加权的CSP(R)实例都存在一个大小至多为其非冗余度的稀疏化器（至多相差多对数因子和1/ε因子）。对于加权实例，我们类似地将可稀疏化性归结为谓词的所谓链长。这些结果精确刻画了任何CSP能被稀疏化的程度。我们的结果是在非线性码（或等价地，集合族）的通用框架下建立的，从而得到了一个用于乘法误差近似的VC型定理。我们工作的一个关键技术要素是熵方法的新颖应用，该方法源于Gilmer最近在并封闭集猜想上的突破。作为我们主要定理的直接推论，非冗余文献中的若干结果立即推广到CSP稀疏化。我们还贡献了理解CSP谓词非冗余度的新技术。通过改编编码理论中匹配向量码文献的方法，我们能够构造一个显式谓词，其非冗余度介于$Ω(n^{1.5})$和$\widetilde{O}(n^{1.6})$之间——这是首个具有可证明非整数指数的例子。