Alternative data provides valuable insights for lenders to evaluate a borrower's creditworthiness, which could help expand credit access to underserved groups and lower costs for borrowers. But some forms of alternative data have historically been excluded from credit underwriting because it could act as an illegal proxy for a protected class like race or gender, causing redlining. We propose a method for applying causal inference to a supervised machine learning model to debias alternative data so that it might be used for credit underwriting. We demonstrate how our algorithm can be used against a public credit dataset to improve model accuracy across different racial groups, while providing theoretically robust nondiscrimination guarantees.
翻译:另类数据为贷款机构评估借款人信用状况提供了宝贵洞见,有助于扩大对服务不足群体的信贷覆盖并降低借款人成本。但历史上某些形式的另类数据被排除在信用风险评估体系之外,因其可能成为种族或性别等受保护特征的非法代理变量,导致红线歧视现象。本文提出一种将因果推断应用于监督机器学习模型的方法,通过对另类数据进行去偏处理,使其能够用于信用风险评估。我们通过公开信用数据集验证了所提算法的有效性,证明其能在提供理论鲁棒的无歧视保证的同时,提升模型在不同种族群体间的预测准确性。