Due to the recent increase in interest in Financial Technology (FinTech), applications like credit default prediction (CDP) are gaining significant industrial and academic attention. In this regard, CDP plays a crucial role in assessing the creditworthiness of individuals and businesses, enabling lenders to make informed decisions regarding loan approvals and risk management. In this paper, we propose a workflow-based approach to improve CDP, which refers to the task of assessing the probability that a borrower will default on his or her credit obligations. The workflow consists of multiple steps, each designed to leverage the strengths of different techniques featured in machine learning pipelines and, thus best solve the CDP task. We employ a comprehensive and systematic approach starting with data preprocessing using Weight of Evidence encoding, a technique that ensures in a single-shot data scaling by removing outliers, handling missing values, and making data uniform for models working with different data types. Next, we train several families of learning models, introducing ensemble techniques to build more robust models and hyperparameter optimization via multi-objective genetic algorithms to consider both predictive accuracy and financial aspects. Our research aims at contributing to the FinTech industry in providing a tool to move toward more accurate and reliable credit risk assessment, benefiting both lenders and borrowers.
翻译:由于近期对金融科技(FinTech)的关注度提升,信用违约预测(CDP)等应用正获得显著的工业界与学术界关注。在此背景下,CDP在评估个人及企业信用可靠性方面发挥着关键作用,能够帮助贷款方在贷款审批与风险管理中做出更明智的决策。本文提出一种基于工作流的方法来改进CDP(即评估借款人违约概率的任务)。该工作流包含多个步骤,每个步骤旨在利用机器学习流水线中不同技术的优势,从而最优地解决CDP任务。我们采用全面且系统的方法:首先使用证据权重(Weight of Evidence)编码进行数据预处理,该技术通过一次性数据缩放(移除异常值、处理缺失值、使数据适用于不同数据类型的模型)来确保数据统一性。随后,我们训练多类学习模型,引入集成技术构建更鲁棒的模型,并通过多目标遗传算法进行超参数优化,兼顾预测准确性与财务指标。本研究旨在为金融科技行业提供一种工具,推动更准确、可靠的信用风险评估,使借贷双方均能受益。