Non-Invasive Fairness in Learning through the Lens of Data Drift

Machine Learning (ML) models are widely employed to drive many modern data systems. While they are undeniably powerful tools, ML models often demonstrate imbalanced performance and unfair behaviors. The root of this problem often lies in the fact that different subpopulations commonly display divergent trends: as a learning algorithm tries to identify trends in the data, it naturally favors the trends of the majority groups, leading to a model that performs poorly and unfairly for minority populations. Our goal is to improve the fairness and trustworthiness of ML models by applying only non-invasive interventions, i.e., without altering the data or the learning algorithm. We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift, which indicates the poor conformance between parts of the data and the trained model. We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data. Both our methods introduce novel ways to employ the recently-proposed data profiling primitive of Conformance Constraints. Our experimental evaluation over 7 real-world datasets shows that both DifFair and ConFair improve the fairness of ML models. We demonstrate scenarios where DifFair has an edge, though ConFair has the greatest practical impact and outperforms other baselines. Moreover, as a model-agnostic technique, ConFair stays robust when used against different models than the ones on which the weights have been learned, which is not the case for other state of the art.

翻译：机器学习（ML）模型被广泛用于驱动众多现代数据系统。尽管它们无疑是强大的工具，但ML模型常表现出不平衡的性能和不公平的行为。这一问题的根源通常在于不同子群体常呈现不同的数据趋势：当学习算法试图识别数据中的趋势时，它自然倾向于多数群体的趋势，导致模型对少数群体表现不佳且不公平。我们的目标是通过仅应用非侵入性干预（即不修改数据或学习算法）来提升ML模型的公平性和可信度。我们利用一个简单但关键的见解：不同群体之间，以及随后学习模型与少数群体之间的趋势差异，类似于数据漂移——这揭示了数据部分与训练模型之间的不良契合度。我们探索两种策略（模型拆分和重新加权）来解决这种漂移，旨在提升模型与底层数据的整体契合度。这两种方法均引入新途径来应用近期提出的数据剖析原语——一致性约束。我们在7个真实世界数据集上的实验评估表明，DifFair和ConFair均能提升ML模型的公平性。我们展示了DifFair具有优势的场景，尽管ConFair具有最大的实际影响力，且优于其他基线方法。此外，作为一种模型无关技术，ConFair在应用于不同于权重学习时所使用的模型时仍保持鲁棒性，而其他先进方法则不具备这一特性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日