Predictive algorithms inform consequential decisions in settings where the outcome is selectively observed given some choices made by human decision makers. There often exists unobserved confounders that affected the decision maker's choice and the outcome. We propose a unified methodology for the robust design and evaluation of predictive algorithms in selectively observed data under such unobserved confounding. Our approach imposes general assumptions on how much the outcome may vary on average between unselected and selected units conditional on observed covariates and identified nuisance parameters, formalizing popular empirical strategies for imputing missing data such as proxy outcomes and instrumental variables. We develop debiased machine learning estimators for the bounds on a large class of predictive performance estimands, such as the conditional likelihood of the outcome, a predictive algorithm's mean square error, true/false positive rate, and many others, under these assumptions. In an administrative dataset from a large Australian financial institution, we illustrate how varying assumptions on unobserved confounding leads to meaningful changes in default risk predictions and evaluations of credit scores across sensitive groups.
翻译:预测算法在因人类决策者的选择而选择性地观察结果的情境中,为关键决策提供信息。通常存在影响决策者选择和结果的未观测混杂因素。我们提出了一种统一方法论,用于在存在此类未观测混杂的情况下,对选择性观测数据的预测算法进行鲁棒设计与评估。我们的方法对未选择单元与选择单元之间(在给定观测协变量和已识别干扰参数的条件下)结果平均可能变化程度施加了一般性假设,从而形式化了用于填补缺失数据的常用实证策略(如代理结果和工具变量)。我们开发了去偏机器学习估计量,用于在这些假设下估计一大类预测性能估计目标(例如结果的条件似然、预测算法的均方误差、真/假阳性率等)的界限。利用来自一家大型澳大利亚金融机构的管理数据集,我们展示了未观测混杂的不同假设如何导致不同敏感群体间违约风险预测和信用评分评估的显著变化。