Fair regression under localized demographic parity constraints

Demographic parity (DP) is a widely used group fairness criterion requiring predictive distributions to be invariant across sensitive groups. While natural in classification, full distributional DP is often overly restrictive in regression and can lead to substantial accuracy loss. We propose a relaxation of DP tailored to regression, enforcing parity only at a finite set of quantile levels and/or score thresholds. Concretely, we introduce a novel (${\ell}$, Z)-fair predictor, which imposes groupwise CDF constraints of the form F f |S=s (z m ) = ${\ell}$ m for prescribed pairs (${\ell}$ m , z m ). For this setting, we derive closed-form characterizations of the optimal fair discretized predictor via a Lagrangian dual formulation and quantify the discretization cost, showing that the risk gap to the continuous optimum vanishes as the grid is refined. We further develop a model-agnostic post-processing algorithm based on two samples (labeled for learning a base regressor and unlabeled for calibration), and establish finite-sample guarantees on constraint violation and excess penalized risk. In addition, we introduce two alternative frameworks where we match group and marginal CDF values at selected score thresholds. In both settings, we provide closed-form solutions for the optimal fair discretized predictor. Experiments on synthetic and real datasets illustrate an interpretable fairness-accuracy trade-off, enabling targeted corrections at decision-relevant quantiles or thresholds while preserving predictive performance.

翻译：摘要：人口统计平价（DP）是一种广泛使用的群体公平性准则，要求预测分布在敏感群体间保持不变。虽然该准则在分类任务中自然适用，但完整的分布级DP在回归中往往过于严格，可能导致显著的精度损失。我们提出了一种针对回归场景的DP松弛策略，仅在有限的分位数水平和/或评分阈值处强制群体间平价。具体地，我们引入了一种新型（ℓ, z）-公平预测器，它对预设的配对（ℓ_m, z_m）施加形如F_{f|S=s}(z_m) = ℓ_m的群体级累积分布函数约束。针对该设定，我们通过拉格朗日对偶公式推导出最优公平离散化预测器的闭式解，量化了离散化代价，并证明了当网格细化时风险间隙向连续最优解趋近。我们进一步提出了一种基于两个样本（标注样本用于训练基础回归模型，未标注样本用于校准）的模型无关后处理算法，并建立了关于约束违反和超额惩罚风险的有限样本保证。此外，我们引入了两种替代框架，在选定评分阈值处匹配群体与边际累积分布函数值。在两种设定下，我们均提供了最优公平离散化预测器的闭式解。在合成和真实数据集上的实验展示了可解释的公平性-精度权衡，能够在保持预测性能的同时，在决策相关的分位数或阈值处实现针对性修正。