A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

Resilience is one of the key algorithmic problems underlying various forms of reverse data management (such as view maintenance, deletion propagation, and various interventions for fairness): What is the minimal number of tuples to delete from a database in order to remove all answers from a query? A long-open question is determining those conjunctive queries (CQs) for which this problem can be solved in guaranteed PTIME. We shed new light on this and the related problem of causal responsibility by proposing a unified Integer Linear Programming (ILP) formulation. It is unified in that it can solve both prior studied restrictions (e.g., self-join-free CQs under set semantics that allow a PTIME solution) and new cases (e.g., all CQs under set or bag semantics It is also unified in that all queries and all instances are treated with the same approach, and the algorithm is guaranteed to terminate in PTIME for the easy cases. We prove that, for all easy self-join-free CQs, the Linear Programming (LP) relaxation of our encoding is identical to the ILP solution and thus standard ILP solvers are guaranteed to return the solution in PTIME. Our approach opens up the door to new variants and new fine-grained analysis: 1) It also works under bag semantics and we give the first dichotomy result for bags semantics in the problem space. 2) We give a more fine-grained analysis of the complexity of causal responsibility. 3) We recover easy instances for generally hard queries, such as instances with read-once provenance and instances that become easy because of Functional Dependencies in the data. 4) We solve an open conjecture from PODS 2020. 5) Experiments confirm that our results indeed predict the asymptotic running times, and that our universal ILP encoding is at times even faster to solve for the PTIME cases than a prior proposed dedicated flow algorithm.

翻译：弹性是支撑各类反向数据管理（如视图维护、删除传播及公平性干预）的核心算法问题之一：为从查询结果中移除所有答案，需从数据库中删除的最小元组数量是多少？一个长期悬而未决的问题，是确定哪些合取查询（CQ）能在保证多项式时间内求解该问题。我们通过提出统一的整数线性规划（ILP）公式，为这一问题及相关因果责任问题带来新视角。该公式的统一性体现在两方面：其一，既能解决先前已研究的受限情形（例如在集合语义下允许多项式时间求解的不含自连接CQ），也能涵盖新案例（例如集合或袋语义下的所有CQ）；其二，所有查询与所有实例均采用相同方法处理，且对于简单案例，算法保证在多项式时间内终止。我们证明：对所有简单的不含自连接CQ，我们编码的线性规划（LP）松弛解与ILP解等同，因此标准ILP求解器能保证在多项式时间内返回解。我们的方法为新型变体与更细粒度分析开辟了路径：1）该方法同样适用于袋语义，我们首次给出该问题空间在袋语义下的二分性结果；2）我们对因果责任的复杂度进行了更细粒度的分析；3）我们恢复了一般困难查询的简单实例，例如具有读一次溯源性质的实例，以及因数据函数依赖而变得简单的实例；4）我们解决了PODS 2020的一个开放猜想；5）实验证实我们的结果确实能预测渐近运行时间，且对于多项式时间可解案例，我们的通用ILP编码有时甚至比先前提出的专用流算法求解更快。

相关内容

ILP

关注 132

归纳逻辑程序设计（ILP）是机器学习的一个分支，它依赖于逻辑程序作为一种统一的表示语言来表达例子、背景知识和假设。基于一阶逻辑的ILP具有很强的表示形式，为多关系学习和数据挖掘提供了一种很好的方法。International Conference on Inductive Logic Programming系列始于1991年，是学习结构化或半结构化关系数据的首要国际论坛。最初专注于逻辑程序的归纳，多年来，它大大扩展了研究范围，并欢迎在逻辑学习、多关系数据挖掘、统计关系学习、图形和树挖掘等各个方面作出贡献，学习其他（非命题）基于逻辑的知识表示框架，探索统计学习和其他概率方法的交叉点。官网链接：https://ilp2019.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日