Accurately credit default prediction faces challenges due to imbalanced data and low correlation between features and labels. Existing default prediction studies on the basis of gradient boosting decision trees (GBDT), deep learning techniques, and feature selection strategies can have varying degrees of success depending on the specific task. Motivated by this, we propose Tab-Attention, a novel self-attention-based stacked generalization method for credit default prediction. This approach ensembles the potential proprietary knowledge contributions from multi-view feature spaces, to cope with low feature correlation and imbalance. We organize multi-view feature spaces according to the latent linear or nonlinear strengths between features and labels. Meanwhile, the f1 score assists the model in imbalance training to find the optimal state for identifying minority default samples. Our Tab-Attention achieves superior Recall_1 and f1_1 of default intention recognition than existing GBDT-based models and advanced deep learning by about 32.92% and 16.05% on average, respectively, while maintaining outstanding overall performance and prediction performance for non-default samples. The proposed method could ensemble essential knowledge through the self-attention mechanism, which is of great significance for a more robust future prediction system.
翻译:准确预测信用违约面临数据不平衡以及特征与标签之间低相关性的挑战。现有的基于梯度提升决策树(GBDT)、深度学习技术和特征选择策略的违约预测研究,其成功程度因具体任务而异。受此启发,我们提出Tab-Attention,一种新颖的基于自注意力机制的重叠泛化方法用于信用违约预测。该方法集成了来自多视角特征空间的潜在专有知识贡献,以应对低特征相关性和不平衡问题。我们根据特征与标签之间的潜在线性或非线性强度来组织多视角特征空间。同时,F1分数辅助模型在不平衡训练中找到识别少数违约样本的最优状态。与现有基于GBDT的模型和先进深度学习方法相比,我们的Tab-Attention在违约意图识别的Recall_1和F1_1上平均分别提升了约32.92%和16.05%,同时保持了非违约样本的卓越整体性能和预测性能。所提出的方法可通过自注意力机制集成关键知识,这对构建更稳健的未来预测系统具有重要意义。