Mortgage risk assessment traditionally relies on structured financial data, which is often proprietary, confidential, and costly. In this study, we propose a novel multimodal deep learning framework that uses cost-free, publicly available, unstructured data sources, including textual information, images, and sentiment scores, to generate credit scores that approximate commercial scorecards. Our framework adopts a two-phase approach. In the unimodal phase, we identify the best-performing models for each modality, i.e. BERT for text, VGG for image data, and a multilayer perceptron for sentiment-based features. In the fusion phase, we introduce the capsule-based fusion network (FusionCapsNet), a novel fusion strategy inspired by capsule networks, but fundamentally redesigned for multimodal integration. Unlike standard capsule networks, our method adapts a specific mechanism in capsule networks to each modality and restructures the fusion process to preserve spatial, contextual, and modality-specific information. It also enables adaptive weighting so that stronger modalities dominate without ignoring complementary signals. Our framework incorporates sentiment analysis across distinct news categories to capture borrower and market dynamics and employs GradCAM-based visualizations as an interpretability tool. These components are designed features of the framework, while our results later demonstrate that they effectively enrich contextual understanding and highlight the influential factors driving mortgage risk predictions. Our results show that our multimodal FusionCapsNet framework not only exceeds individual unimodal models but also outperforms benchmark fusion strategies such as addition, concatenation, and cross attention in terms of AUC, partial AUC, and F1 score, demonstrating clear gains in both predictive accuracy and interpretability for mortgage risk assessment.
翻译:传统的抵押贷款风险评估依赖于结构化金融数据,这些数据通常具有专有性、保密性且成本高昂。本研究提出了一种新颖的多模态深度学习框架,该框架利用免费、公开可用的非结构化数据源(包括文本信息、图像和情感分数)来生成近似商业评分卡的信用评分。我们的框架采用两阶段方法。在单模态阶段,我们为每种模态确定了性能最佳的模型,即用于文本的BERT、用于图像数据的VGG以及用于基于情感特征的多层感知机。在融合阶段,我们引入了基于胶囊的融合网络(FusionCapsNet),这是一种受胶囊网络启发但为多模态集成进行根本性重新设计的新型融合策略。与标准胶囊网络不同,我们的方法将胶囊网络中的特定机制适配到每种模态,并重构融合过程以保留空间、上下文和模态特定信息。它还实现了自适应加权,使得更强的模态能够主导,同时不忽略互补信号。我们的框架整合了跨不同新闻类别的情感分析,以捕捉借款人和市场动态,并采用基于GradCAM的可视化作为可解释性工具。这些组件是框架的设计特性,而我们的后续结果表明,它们有效地丰富了上下文理解,并突出了驱动抵押贷款风险预测的影响因素。我们的结果表明,我们的多模态FusionCapsNet框架不仅在AUC、部分AUC和F1分数方面超越了单个单模态模型,而且优于加法、拼接和交叉注意力等基准融合策略,在抵押贷款风险评估的预测准确性和可解释性方面均显示出明显提升。