Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced by the matcher at this stage can contribute to promoting fairness in downstream tasks. This demonstration showcases FairEM360, a framework for 1) auditing the output of entity matchers across a wide range of fairness measures and paradigms, 2) providing potential explanations for the underlying reasons for unfairness, and 3) providing resolutions for the unfairness issues through an exploratory process with human-in-the-loop feedback, utilizing an ensemble of matchers. We aspire for FairEM360 to contribute to the prioritization of fairness as a key consideration in the evaluation of EM pipelines.
翻译:实体匹配是大数据流程中最早出现的任务之一,令人担忧的是,它容易受到无意识偏差的影响,从而损害数据质量。在此阶段识别并缓解数据中存在的或匹配器引入的偏差,有助于促进下游任务的公平性。本演示展示了FairEM360框架,该框架能够:1)在广泛的公平性度量标准和范式下审计实体匹配器的输出;2)为不公平性的潜在原因提供可能的解释;3)通过利用匹配器集成与人在回路反馈的探索性过程,为不公平性问题提供解决方案。我们期望FairEM360能推动公平性成为实体匹配流程评估中的关键考量因素。