We examine the problem of obtaining fair outcomes for individuals who choose to share optional information with machine-learned models and those who do not consent and keep their data undisclosed. We find that these non-consenting users receive significantly lower prediction outcomes than justified by their provided information alone. This observation gives rise to the overlooked problem of how to ensure that users, who protect their personal data, are not penalized. While statistical fairness notions focus on fair outcomes between advantaged and disadvantaged groups, these fairness notions fail to protect the non-consenting users. To address this problem, we formalize protection requirements for models which (i) allow users to benefit from sharing optional information and (ii) do not penalize them if they keep their data undisclosed. We offer the first solution to this problem by proposing the notion of Optional Feature Fairness (OFF), which we prove to be loss-optimal under our protection requirements (i) and (ii). To learn OFF-compliant models, we devise a model-agnostic data augmentation strategy with finite sample convergence guarantees. Finally, we extensively analyze OFF on a variety of challenging real-world tasks, models, and data sets with multiple optional features.
翻译:我们研究了在机器学习模型中,选择共享可选信息的用户与未同意共享、保持数据未披露的用户之间获得公平结果的问题。我们发现,这些未同意共享的用户所获得的预测结果显著低于仅凭其提供的信息所应得的合理结果。这一观察结果引出了一个被忽视的问题,即如何确保保护个人数据的用户不会受到惩罚。尽管统计公平性概念关注的是优势群体与弱势群体之间的公平结果,但这些公平性概念未能保护未同意共享的用户。为解决此问题,我们形式化地定义了模型应满足的保护要求,即(i)允许用户从共享可选信息中获益,以及(ii)若用户选择不披露数据则不应惩罚他们。我们提出了首个解决方案,即可选特征公平性(Optional Feature Fairness, OFF),并证明该概念在我们的保护要求(i)和(ii)下是损失最优的。为了学习符合OFF要求的模型,我们设计了一种模型无关的数据增强策略,该策略具有有限样本下的收敛保证。最后,我们在多个含有可选特征的具有挑战性的实际任务、模型和数据集上,对OFF进行了广泛分析。