Bank credit risk is a significant challenge in modern financial transactions, and the ability to identify qualified credit card holders among a large number of applicants is crucial for the profitability of a bank'sbank's credit card business. In the past, screening applicants'applicants' conditions often required a significant amount of manual labor, which was time-consuming and labor-intensive. Although the accuracy and reliability of previously used ML models have been continuously improving, the pursuit of more reliable and powerful AI intelligent models is undoubtedly the unremitting pursuit by major banks in the financial industry. In this study, we used a dataset of over 40,000 records provided by a commercial bank as the research object. We compared various dimensionality reduction techniques such as PCA and T-SNE for preprocessing high-dimensional datasets and performed in-depth adaptation and tuning of distributed models such as LightGBM and XGBoost, as well as deep models like Tabnet. After a series of research and processing, we obtained excellent research results by combining SMOTEENN with these techniques. The experiments demonstrated that LightGBM combined with PCA and SMOTEENN techniques can assist banks in accurately predicting potential high-quality customers, showing relatively outstanding performance compared to other models.
翻译:银行信用风险是现代金融交易中的一项重大挑战,从大量申请者中识别合格的信用卡持有人对于银行信用卡业务的盈利能力至关重要。以往,筛选申请者条件通常需要大量人工劳动,耗时且费力。尽管先前使用的机器学习模型的准确性与可靠性在持续提升,但追求更可靠、更强大的人工智能模型无疑是金融业各大银行不懈追求的目标。本研究以一家商业银行提供的超过40,000条记录的数据集为研究对象,比较了PCA和T-SNE等多种降维技术以预处理高维数据集,并对LightGBM、XGBoost等分布式模型以及Tabnet等深度模型进行了深入的适配与调优。经过一系列研究处理,我们通过将SMOTEENN与这些技术结合,获得了优异的研究成果。实验表明,结合PCA与SMOTEENN技术的LightGBM能够协助银行准确预测潜在优质客户,相较于其他模型展现出较为突出的性能。