We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal \emph{argumentation framework}(AF). Such argumentation frameworks can be automatically analyzed and solved by translating them to a logic program $P_{AF}$ whose declarative semantics yield a transparent solution with many desirable properties, e.g., uncontroversial updates are accepted, unjustified ones are rejected, and the remaining ambiguities are exposed and presented to users for further analysis. After motivating the problem, we introduce our approach and illustrate it with a detailed running example introducing both well-founded and stable semantics to help understand the AF solutions. We have begun to develop open source tools and Jupyter notebooks that demonstrate the practicality of our approach. In future work we plan to develop a toolkit for conflict resolution that can be used in conjunction with OpenRefine, a popular interactive data cleaning tool.
翻译:我们提出了一种新方法,用于建模与协调冲突性的数据清洗操作。此类冲突自然产生于协作式数据管理场景中——多位专家独立工作后,试图整合各自成果以改进和加速数据清洗过程。该方法的核心思想是将冲突性更新建模为形式化的论证框架。通过将该论证框架转换为逻辑程序P_AF,可自动分析并求解该框架;其声明式语义能提供具备多项理想属性的透明解决方案(例如:无争议的更新被接受,无依据的更新被拒绝,剩余歧义被暴露并呈现给用户供进一步分析)。在阐述问题动机后,我们介绍了该方法,并通过包含良基语义与稳定语义的详细运行示例进行说明,以帮助理解论证框架解决方案。我们已初步开发了展示该方法实用性的开源工具和Jupyter Notebooks。未来计划开发一套可与流行交互式数据清洗工具OpenRefine协同使用的冲突解决工具包。