Machine learning models are often personalized with categorical attributes that are protected, sensitive, self-reported, or costly to acquire. In this work, we show models that are personalized with group attributes can reduce performance at a group level. We propose formal conditions to ensure the "fair use" of group attributes in prediction tasks by training one additional model -- i.e., collective preference guarantees to ensure that each group who provides personal data will receive a tailored gain in performance in return. We present sufficient conditions to ensure fair use in empirical risk minimization and characterize failure modes that lead to fair use violations due to standard practices in model development and deployment. We present a comprehensive empirical study of fair use in clinical prediction tasks. Our results demonstrate the prevalence of fair use violations in practice and illustrate simple interventions to mitigate their harm.
翻译:机器学习模型常使用受保护、敏感、自报告或获取成本较高的分类属性进行个性化。本研究表明,使用群体属性进行个性化的模型可能会降低群体层面的性能。我们提出形式化条件以确保群体属性在预测任务中的"公平使用"——即训练一个额外模型来实现集体偏好保障,确保提供个人数据的每个群体都能获得相应的性能提升回报。我们给出了在经验风险最小化中确保公平使用的充分条件,并刻画了因标准模型开发与部署实践导致违反公平使用的典型失效模式。我们通过临床预测任务的综合实证研究,展示了实践中公平使用 violations 的普遍性,并提出了减轻其危害的简单干预措施。