利用Shapley值解释大语言模型的决策机制 (Explaining Large Language Models Decisions Using Shapley Values)

The emergence of large language models (LLMs) has opened up exciting possibilities for simulating human behavior and cognitive processes, with potential applications in various domains, including marketing research and consumer behavior analysis. However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain due to glaring divergences that suggest fundamentally different underlying processes at play and the sensitivity of LLM responses to prompt variations. This paper presents a novel approach based on Shapley values from cooperative game theory to interpret LLM behavior and quantify the relative contribution of each prompt component to the model's output. Through two applications - a discrete choice experiment and an investigation of cognitive biases - we demonstrate how the Shapley value method can uncover what we term "token noise" effects, a phenomenon where LLM decisions are disproportionately influenced by tokens providing minimal informative content. This phenomenon raises concerns about the robustness and generalizability of insights obtained from LLMs in the context of human behavior simulation. Our model-agnostic approach extends its utility to proprietary LLMs, providing a valuable tool for practitioners and researchers to strategically optimize prompts and mitigate apparent cognitive biases. Our findings underscore the need for a more nuanced understanding of the factors driving LLM responses before relying on them as substitutes for human subjects in survey settings. We emphasize the importance of researchers reporting results conditioned on specific prompt templates and exercising caution when drawing parallels between human behavior and LLMs.

翻译：大语言模型（LLMs）的出现为模拟人类行为与认知过程开辟了新的可能性，在市场营销研究和消费者行为分析等多个领域展现出应用潜力。然而，由于LLMs与人类存在显著差异——这暗示其底层运作机制存在本质区别，且LLM的响应对提示词变动极为敏感——将其作为人类受试者替代品的有效性仍存疑。本文提出一种基于合作博弈论中Shapley值的新方法，用以解释LLM行为并量化各提示词成分对模型输出的相对贡献。通过离散选择实验和认知偏差研究两项应用，我们展示了Shapley值方法如何揭示所谓的"令牌噪声"效应：即LLM的决策过程被信息含量极低的令牌过度影响的现象。该现象引发了对LLMs在人类行为模拟场景中所获结论的鲁棒性与可推广性的担忧。我们提出的模型无关方法可扩展至专有LLMs，为从业者和研究者提供优化提示词策略、缓解显性认知偏差的重要工具。研究结果强调，在将LLMs作为调查环境中人类受试者替代品之前，需要更细致地理解驱动LLM响应的内在因素。我们特别指出，研究者应报告基于特定提示模板的实验结果，并在建立人类行为与LLMs之间的关联时保持审慎态度。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日