Data-driven decision-making is at the core of many modern applications, and understanding the data is critical in supporting trust in these decisions. However, data is dynamic and evolving, just like the real-world entities it represents. Thus, an important component of understanding data is analyzing and drawing insights from the changes it undergoes. Existing methods for exploring data change list differences exhaustively, which are not interpretable by humans and lack salient insights regarding change trends. For example, an explanation that semantically summarizes changes to highlight gender disparities in performance rewards is more human-consumable than a long list of employee salary changes. We demonstrate ChARLES, a system that derives semantic summaries of changes between two snapshots of an evolving database, in an effective, concise, and interpretable way. Our key observation is that, while datasets often evolve through point and other small-batch updates, rich data features can reveal latent semantics that can intuitively summarize the changes. Under the hood, ChARLES compares database versions, infers feasible transformations by fitting multiple regression lines over different data partitions to derive change summaries, and ranks them. ChARLES allows users to customize it to obtain their preferred explanation by navigating the accuracy-interpretability tradeoff, and offers a proof of concept for reasoning about data evolution over real-world datasets.
翻译:数据驱动的决策是现代许多应用的核心,理解数据对于支持这些决策的可信度至关重要。然而,数据是动态且不断演化的,正如其所代表的现实世界实体一样。因此,理解数据的一个重要组成部分是分析其经历的变化并从中提取洞见。现有的数据变化探索方法详尽地列出差异,这些差异对人类而言难以解释,且缺乏关于变化趋势的显著洞见。例如,一个从语义上总结变化以突出绩效奖励中性别差异的解释,比一长串员工薪资变化列表更易于人类理解。我们展示了ChARLES系统,该系统以高效、简洁且可解释的方式,从演化数据库的两个快照中推导出变化的语义摘要。我们的关键观察是,虽然数据集通常通过点更新和其他小批量更新进行演化,但丰富的数据特征可以揭示潜在的语义,从而直观地总结变化。在底层,ChARLES比较数据库版本,通过在不同数据分区上拟合多条回归线来推断可行的变换以推导变化摘要,并对它们进行排序。ChARLES允许用户通过导航准确性与可解释性之间的权衡来自定义系统,以获得他们偏好的解释,并为在真实世界数据集上进行数据演化推理提供了概念验证。