Dynamic learning systems subject to selective labeling exhibit censoring, i.e. persistent negative predictions assigned to one or more subgroups of points. In applications like consumer finance, this results in groups of applicants that are persistently denied and thus never enter into the training data. In this work, we formalize censoring, demonstrate how it can arise, and highlight difficulties in detection. We consider safeguards against censoring - recourse and randomized-exploration - both of which ensure we collect labels for points that would otherwise go unobserved. The resulting techniques allow examples from censored groups to enter into the training data and correct the model. Our results highlight the otherwise unmeasured harms of censoring and demonstrate the effectiveness of mitigation strategies across a range of data generating processes.
翻译:受选择性标注影响,动态学习系统会表现出审查现象,即对一个或多个子群体持续给出负预测。在消费金融等应用中,这导致部分申请群体持续被拒绝,因而从未进入训练数据。本研究正式定义了审查现象,展示了其产生机制,并揭示了检测的困难性。我们考虑了两种针对审查的防护措施——申诉机制与随机探索——两者均能确保我们收集原本无法观测的数据标签。由此产生的技术可使被审查群体的样本进入训练数据并修正模型。我们的研究结果凸显了审查行为原本难以量化的危害,并证明了多种数据生成过程中缓解策略的有效性。