Label bias occurs when the outcome of interest is not directly observable and instead modeling is performed with proxy labels. When the difference between the true outcome and the proxy label is correlated with predictors, this can yield systematic disparities in predictions for different groups of interest. We propose Bayesian hierarchical measurement models to address these issues. Through practical examples, we demonstrate how our approach improves accuracy and helps with algorithmic fairness.
翻译:标签偏差发生在感兴趣的结果无法直接观测,而需使用代理标签进行建模时。当真实结果与代理标签之间的差异与预测变量相关时,这可能导致不同目标群体在预测中产生系统性偏差。我们提出贝叶斯分层测量模型来解决这些问题。通过实际案例,我们展示了该方法如何提高预测准确性并促进算法公平性。