Charts, figures, and text derived from data play an important role in decision making, from data-driven policy development to day-to-day choices informed by online articles. Making sense of, or fact-checking, outputs means understanding how they relate to the underlying data. Even for domain experts with access to the source code and data sets, this poses a significant challenge. In this paper we introduce a new program analysis framework which supports interactive exploration of fine-grained I/O relationships directly through computed outputs, making use of dynamic dependence graphs. Our main contribution is a novel notion in data provenance which we call related inputs, a relation of mutual relevance or "cognacy" which arises between inputs when they contribute to common features of the output. Queries of this form allow readers to ask questions like "What outputs use this data element, and what other data elements are used along with it?". We show how Jonsson and Tarski's concept of conjugate operators on Boolean algebras appropriately characterises the notion of cognacy in a dependence graph, and give a procedure for computing related inputs over such a graph.
翻译:由数据生成的图表、图形与文本在决策制定中扮演着重要角色,涵盖从数据驱动的政策制定到基于在线文章的日常选择。理解或验证这些输出结果,意味着需要厘清它们与底层数据之间的关联性。即便对于能够访问源代码和数据集的领域专家而言,这仍然是一项重大挑战。本文提出一种新型程序分析框架,通过动态依赖图支持用户直接在计算结果上对细粒度输入/输出关系进行交互式探索。我们的核心贡献在于提出数据溯源中的新概念——"关联输入",即当不同输入共同作用于输出特征时形成的相互关联或"共源"关系。此类查询使读者能够提出诸如"哪些输出使用了该数据元素,以及与之共同使用的其他数据元素是什么?"等问题。研究表明,Jonsson与Tarski提出的布尔代数共轭算子概念能够恰当地表征依赖图中的共源关系,并给出在此类图上计算关联输入的具体程序。