Label bias occurs when the outcome of interest is not directly observable and instead, modeling is performed with proxy labels. When the difference between the true outcome and the proxy label is correlated with predictors, this can yield systematic disparities in predictions for different groups of interest. We propose Bayesian hierarchical measurement models to address these issues. When strong prior information about the measurement process is available, our approach improves accuracy and helps with algorithmic fairness. If prior knowledge is limited, our approach allows assessment of the sensitivity of predictions to the unknown specifications of the measurement process. This can help practitioners gauge if enough substantive information is available to guarantee the desired accuracy and avoid disparate predictions when using proxy outcomes. We demonstrate our approach through practical examples.
翻译:当目标结果无法直接观测而必须使用代理标签进行建模时,便会产生标签偏差。若真实结果与代理标签之间的差异与预测变量相关,则可能导致针对不同目标群体的预测出现系统性偏差。本文提出采用贝叶斯分层测量模型来解决此类问题。当具备关于测量过程的强先验信息时,该方法能提升预测精度并促进算法公平性。若先验知识有限,该方法可评估预测结果对测量过程未知参数的敏感性,从而帮助实践者判断现有实质性信息是否足以保证预期精度,并在使用代理结果时避免差异性预测。我们通过实际案例对该方法进行了验证。