Machine learning models are often personalized with categorical attributes that are protected, sensitive, self-reported, or costly to acquire. In this work, we show models that are personalized with group attributes can reduce performance at a group level. We propose formal conditions to ensure the "fair use" of group attributes in prediction tasks by training one additional model -- i.e., collective preference guarantees to ensure that each group who provides personal data will receive a tailored gain in performance in return. We present sufficient conditions to ensure fair use in empirical risk minimization and characterize failure modes that lead to fair use violations due to standard practices in model development and deployment. We present a comprehensive empirical study of fair use in clinical prediction tasks. Our results demonstrate the prevalence of fair use violations in practice and illustrate simple interventions to mitigate their harm.
翻译:机器学习模型常利用受保护、敏感、自我报告或获取成本较高的分类属性进行个性化处理。本研究表明,使用群体属性进行个性化的模型反而可能降低群体层面的性能。我们提出了确保预测任务中群体属性"公平使用"的正式条件——通过额外训练一个模型实现集体偏好保障,确保提供个人数据的每个群体都能获得相应的性能提升回报。我们给出了在经验风险最小化中保障公平使用的充分条件,并刻画了因模型开发与部署中的常规实践而导致公平使用被违反的失效模式。我们在临床预测任务中开展了公平使用的全面实证研究,结果揭示了实践中公平使用违规现象的普遍性,并展示了减轻其危害的简单干预措施。