Credit risk prediction is a critical problem in the consumer credit industry. Traditionally, financial institutions construct credit risk prediction models using borrowers' demographic, financial, and credit history data, collectively referred to as traditional data. Recent studies have demonstrated that alternative data, such as borrowers' mobile phone communication data, enable lenders to acquire fuller and more accurate profiles of borrowers' creditworthiness, thereby improving credit risk prediction performance. Nevertheless, alternative data are held by external entities independent of financial institutions. Directly sharing alternative data with financial institutions infringe on consumer privacy, yet existing credit risk prediction studies largely overlook this issue. To address this gap, we define a new problem, namely privacy-preserving credit risk prediction with alternative data, which simultaneously considers three practical constraints: the privacy-preserving constraint that protects consumer privacy, the model-confidentiality constraint that learns and stores the model centrally at the financial institution, and the lossless constraint that maintains the performance of the learned model. To solve this problem, we develop PrivacyCredit, a novel privacy-preserving machine learning method. We then theoretically demonstrate the privacy-preserving, model-confidential, and lossless properties of PrivacyCredit. Through extensive experiments using a real-world credit dataset linked with alternative data, we demonstrate the predictive value of securely incorporating alternative data into credit risk prediction and show that PrivacyCredit achieves the same predictive performance as the model learned from the insecure plaintext combination of traditional and alternative data. We further evaluate its model-confidentiality property and computational efficiency.
翻译:信用风险预测是消费信贷行业的关键问题。传统上,金融机构利用借款人的人口统计、财务和信用历史数据(统称为传统数据)构建信用风险预测模型。近期研究表明,替代数据(如借款人手机通信数据)能使贷方更全面准确地了解借款人信用状况,从而提升信用风险预测性能。然而,替代数据由独立于金融机构的外部实体持有。将替代数据直接共享给金融机构会侵犯消费者隐私,但现有信用风险预测研究大多未考虑这一问题。为填补这一空白,我们定义了一个新问题——基于替代数据的隐私保护信用风险预测,该问题同时考虑三项实际约束:保护消费者隐私的隐私保护约束、在金融机构集中式学习与存储模型的模型保密约束,以及维持模型性能的无损约束。为解决该问题,我们提出PrivacyCredit,一种新型隐私保护机器学习方法。随后我们从理论上证明了PrivacyCredit的隐私保护性、模型保密性和无损性。通过使用与替代数据关联的真实信用数据集进行大量实验,我们验证了安全整合替代数据于信用风险预测中的预测价值,并证明PrivacyCredit能达到与使用传统数据与替代数据非安全明文组合训练模型相同的预测性能。我们还进一步评估了其模型保密性和计算效率。