将大型语言模型解读为信用风险分类器：其特征解释是否与经典机器学习模型一致？ (Interpreting LLMs as Credit Risk Classifiers: Do Their Feature Explanations Align with Classical ML?)

Large Language Models (LLMs) are increasingly explored as flexible alternatives to classical machine learning models for classification tasks through zero-shot prompting. However, their suitability for structured tabular data remains underexplored, especially in high-stakes financial applications such as financial risk assessment. This study conducts a systematic comparison between zero-shot LLM-based classifiers and LightGBM, a state-of-the-art gradient-boosting model, on a real-world loan default prediction task. We evaluate their predictive performance, analyze feature attributions using SHAP, and assess the reliability of LLM-generated self-explanations. While LLMs are able to identify key financial risk indicators, their feature importance rankings diverge notably from LightGBM, and their self-explanations often fail to align with empirical SHAP attributions. These findings highlight the limitations of LLMs as standalone models for structured financial risk prediction and raise concerns about the trustworthiness of their self-generated explanations. Our results underscore the need for explainability audits, baseline comparisons with interpretable models, and human-in-the-loop oversight when deploying LLMs in risk-sensitive financial environments.

翻译：大型语言模型（LLMs）正日益被视为通过零样本提示实现分类任务的灵活替代方案，以取代经典机器学习模型。然而，其在结构化表格数据上的适用性仍待深入探究，尤其是在金融风险评估等高风险金融应用中。本研究针对真实世界的贷款违约预测任务，系统比较了基于零样本LLM的分类器与LightGBM（一种先进的梯度提升模型）的表现。我们评估了它们的预测性能，使用SHAP分析了特征归因，并评估了LLM生成的自解释的可靠性。尽管LLMs能够识别关键的金融风险指标，但其特征重要性排序与LightGBM存在显著差异，且其自解释往往无法与经验性SHAP归因保持一致。这些发现凸显了LLMs作为结构化金融风险预测独立模型的局限性，并引发对其自生成解释可信度的担忧。我们的结果强调，在风险敏感的金融环境中部署LLMs时，需要进行可解释性审计、与可解释模型的基线比较，以及引入人机协同监督。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日