Revisiting Fairness Impossibility with Endogenous Behavior

In many real-world settings, institutions can and do adjust the consequences attached to algorithmic classification decisions, such as the size of fines, sentence lengths, or benefit levels. We refer to these consequences as the stakes associated with classification. These stakes can give rise to behavioral responses to classification, as people adjust their actions in anticipation of how they will be classified. Much of the algorithmic fairness literature evaluates classification outcomes while holding behavior fixed, treating behavioral differences across groups as exogenous features of the environment. Under this assumption, the stakes of classification play no role in shaping outcomes. We revisit classic impossibility results in algorithmic fairness in a setting where people respond strategically to classification. We show that, in this environment, the well-known incompatibility between error-rate balance and predictive parity disappears, but only by potentially introducing a qualitatively different form of unequal treatment. Concretely, we construct a two-stage design in which a classifier first standardizes its statistical performance across groups, and then adjusts stakes so as to induce comparable patterns of behavior. This requires treating groups differently in the consequences attached to identical classification decisions. Our results demonstrate that fairness in strategic settings cannot be assessed solely by how algorithms map data into decisions. Rather, our analysis treats the human consequences of classification as primary design variables, introduces normative criteria governing their use, and shows that their interaction with statistical fairness criteria generates qualitatively new tradeoffs. Our aim is to make these tradeoffs precise and explicit.

翻译：在许多现实场景中，机构能够且确实调整与算法分类决策相关的后果，例如罚款金额、刑期长度或福利水平。我们将这些后果称为分类的“利害关系”。这些利害关系可能引发人们对分类的行为反应，因为人们会根据被分类的方式提前调整自身行为。大部分算法公平性文献在评估分类结果时，假设行为是固定不变的，并将群体间的行为差异视为环境的外生特征。在此假设下，分类的利害关系对结果的形成不起作用。本文在人们策略性地回应分类的设定下，重新审视算法公平性的经典不可能性结果。我们证明，在此环境下，误差率平衡与预测均等之间广为人知的不兼容性消失——但代价是可能引入一种本质上不同的不平等对待形式。具体而言，我们构建了一个两阶段设计：分类器首先在群体间标准化其统计性能，随后调整利害关系以诱发可比的行为模式。这要求对相同分类决策所附带的后果在不同群体间采取差异化处理。我们的结果表明，在策略性环境中，公平性无法仅通过算法如何将数据映射为决策来评估。相反，我们的分析将分类的人类后果作为首要设计变量，引入规范其使用的规范性标准，并证明这些标准与统计公平性标准的相互作用会产生本质上的新权衡。我们的目标是将这些权衡精确且明确地呈现出来。