Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced by the matcher at this stage can contribute to promoting fairness in downstream tasks. This demonstration showcases FairEM360, a framework for 1) auditing the output of entity matchers across a wide range of fairness measures and paradigms, 2) providing potential explanations for the underlying reasons for unfairness, and 3) providing resolutions for the unfairness issues through an exploratory process with human-in-the-loop feedback, utilizing an ensemble of matchers. We aspire for FairEM360 to contribute to the prioritization of fairness as a key consideration in the evaluation of EM pipelines.
翻译:实体匹配是大数据流程中最早发生的任务之一,且极易受到无意偏见的影响,从而损害数据质量。在此阶段识别并缓解数据中存在的或由匹配器引入的偏见,有助于提升下游任务的公平性。本演示展示了FairEM360框架,该框架能够:1)基于广泛的公平性度量与范式对实体匹配器的输出进行审计;2)为不公平现象的根本原因提供潜在解释;3)通过融入人机交互反馈的探索流程,利用匹配器集成提供不公平问题的解决方案。我们期望FairEM360能推动将公平性作为评估实体匹配流程的关键考量因素。