Analyzing Credit Risk Model Problems through NLP-Based Clustering and Machine Learning: Insights from Validation Reports

This paper explores the use of clustering methods and machine learning algorithms, including Natural Language Processing (NLP), to identify and classify problems identified in credit risk models through textual information contained in validation reports. Using a unique dataset of 657 findings raised by validation teams in a large international banking group between January 2019 and December 2022. The findings are classified into nine validation dimensions and assigned a severity level by validators using their expert knowledge. The authors use embedding generation for the findings' titles and observations using four different pre-trained models, including "module\_url" from TensorFlow Hub and three models from the SentenceTransformer library, namely "all-mpnet-base-v2", "all-MiniLM-L6-v2", and "paraphrase-mpnet-base-v2". The paper uses and compares various clustering methods in grouping findings with similar characteristics, enabling the identification of common problems within each validation dimension and severity. The results of the study show that clustering is an effective approach for identifying and classifying credit risk model problems with accuracy higher than 60\%. The authors also employ machine learning algorithms, including logistic regression and XGBoost, to predict the validation dimension and its severity, achieving an accuracy of 80\% for XGBoost algorithm. Furthermore, the study identifies the top 10 words that predict a validation dimension and severity. Overall, this paper makes a contribution by demonstrating the usefulness of clustering and machine learning for analyzing textual information in validation reports, and providing insights into the types of problems encountered in the development and validation of credit risk models.

翻译：本文探索了运用聚类方法和机器学习算法（包括自然语言处理）识别与分类信用风险模型中存在的问题，这些问题的识别依据来自验证报告中的文本信息。研究基于2019年1月至2022年12月期间某大型国际银行集团验证团队提出的657项发现，构成独特数据集。这些发现被划分为九个验证维度，并由验证人员依据专业知识赋予严重等级。作者通过四种预训练模型对发现的标题与观察内容进行嵌入生成，包括TensorFlow Hub中的“module_url”以及SentenceTransformer库中的“all-mpnet-base-v2”、“all-MiniLM-L6-v2”和“paraphrase-mpnet-base-v2”三个模型。本文运用多种聚类方法对具有相似特征的发现进行分组比较，从而识别各验证维度及不同严重等级中的常见问题。研究结果表明，聚类方法在识别与分类信用风险模型问题方面具有有效性，准确率超过60%。作者还采用逻辑回归与XGBoost等机器学习算法预测验证维度及其严重等级，其中XGBoost算法准确率达到80%。此外，研究识别出预测验证维度与严重等级的前10个关键词。总体而言，本文通过论证聚类与机器学习在分析验证报告文本信息中的实用性，为理解信用风险模型开发与验证过程中遇到的各种问题类型提供了重要洞见。