Recent industrial applications in risk prediction still heavily rely on extensively manually-tuned, statistical learning methods. Real-world financial data, characterized by its high dimensionality, sparsity, high noise levels, and significant imbalance, poses unique challenges for the effective application of deep neural network models. In this work, we introduce a novel deep learning risk prediction framework, FinLangNet, which conceptualizes credit loan trajectories in a structure that mirrors linguistic constructs. This framework is tailored for credit risk prediction using real-world financial data, drawing on structural similarities to language by adapting natural language processing techniques. It particularly emphasizes analyzing the development and forecastability of mid-term credit histories through multi-head and sequences of detailed financial events. Our research demonstrates that FinLangNet surpasses traditional statistical methods in predicting credit risk and that its integration with these methods enhances credit overdue prediction models, achieving a significant improvement of over 4.24\% in the Kolmogorov-Smirnov metric.
翻译:当前风险预测的工业应用仍严重依赖大量人工调优的统计学习方法。现实世界金融数据具有高维度、稀疏性、高噪声水平和显著不平衡等特点,这为深度神经网络模型的有效应用带来了独特挑战。本研究提出了一种新型深度学习风险预测框架FinLangNet,该框架将信贷轨迹概念化为一种镜像语言结构的组织形式。该框架专为利用真实金融数据进行信用风险预测而设计,通过借鉴自然语言处理技术,挖掘其与语言结构的相似性。该框架特别强调通过多头注意力机制和细粒度金融事件序列来分析中期信用历史的发展轨迹与可预测性。研究表明,FinLangNet在信用风险预测方面优于传统统计方法,且其与传统方法的融合能显著提升信贷逾期预测模型的性能,在Kolmogorov-Smirnov指标上实现了超过4.24%的显著改进。