A machine learning workflow to address credit default prediction

Due to the recent increase in interest in Financial Technology (FinTech), applications like credit default prediction (CDP) are gaining significant industrial and academic attention. In this regard, CDP plays a crucial role in assessing the creditworthiness of individuals and businesses, enabling lenders to make informed decisions regarding loan approvals and risk management. In this paper, we propose a workflow-based approach to improve CDP, which refers to the task of assessing the probability that a borrower will default on his or her credit obligations. The workflow consists of multiple steps, each designed to leverage the strengths of different techniques featured in machine learning pipelines and, thus best solve the CDP task. We employ a comprehensive and systematic approach starting with data preprocessing using Weight of Evidence encoding, a technique that ensures in a single-shot data scaling by removing outliers, handling missing values, and making data uniform for models working with different data types. Next, we train several families of learning models, introducing ensemble techniques to build more robust models and hyperparameter optimization via multi-objective genetic algorithms to consider both predictive accuracy and financial aspects. Our research aims at contributing to the FinTech industry in providing a tool to move toward more accurate and reliable credit risk assessment, benefiting both lenders and borrowers.

翻译：由于近期对金融科技（FinTech）的关注度提升，信用违约预测（CDP）等应用正获得显著的工业界与学术界关注。在此背景下，CDP在评估个人及企业信用可靠性方面发挥着关键作用，能够帮助贷款方在贷款审批与风险管理中做出更明智的决策。本文提出一种基于工作流的方法来改进CDP（即评估借款人违约概率的任务）。该工作流包含多个步骤，每个步骤旨在利用机器学习流水线中不同技术的优势，从而最优地解决CDP任务。我们采用全面且系统的方法：首先使用证据权重（Weight of Evidence）编码进行数据预处理，该技术通过一次性数据缩放（移除异常值、处理缺失值、使数据适用于不同数据类型的模型）来确保数据统一性。随后，我们训练多类学习模型，引入集成技术构建更鲁棒的模型，并通过多目标遗传算法进行超参数优化，兼顾预测准确性与财务指标。本研究旨在为金融科技行业提供一种工具，推动更准确、可靠的信用风险评估，使借贷双方均能受益。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

《图机器学习》课程

专知会员服务

49+阅读 · 2024年2月18日

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日