Chat Bankman-Fried：大型语言模型在金融领域对齐性的探索 (Chat Bankman-Fried: an Exploration of LLM Alignment in Finance)

Advancements in large language models (LLMs) have renewed concerns about AI alignment - the consistency between human and AI goals and values. As various jurisdictions enact legislation on AI safety, the concept of alignment must be defined and measured across different domains. This paper proposes an experimental framework to assess whether LLMs adhere to ethical and legal standards in the relatively unexplored context of finance. We prompt twelve LLMs to impersonate the CEO of a financial institution and test their willingness to misuse customer assets to repay outstanding corporate debt. Beginning with a baseline configuration, we adjust preferences, incentives and constraints, analyzing the impact of each adjustment with logistic regression. Our findings reveal significant heterogeneity in the baseline propensity for unethical behavior of LLMs. Factors such as risk aversion, profit expectations, and regulatory environment consistently influence misalignment in ways predicted by economic theory, although the magnitude of these effects varies across LLMs. This paper highlights both the benefits and limitations of simulation-based, ex post safety testing. While it can inform financial authorities and institutions aiming to ensure LLM safety, there is a clear trade-off between generality and cost.

翻译：大型语言模型（LLMs）的进展重新引发了人们对人工智能对齐性——即人类与人工智能目标及价值观一致性——的担忧。随着各司法管辖区陆续颁布人工智能安全相关立法，对齐性这一概念必须在不同领域得到界定与度量。本文提出了一个实验框架，用于评估LLMs在相对未被充分探索的金融情境下是否遵循伦理与法律标准。我们提示十二个LLMs扮演一家金融机构的首席执行官，测试其是否愿意挪用客户资产以偿还公司未偿债务。我们从基准配置开始，逐步调整偏好、激励与约束条件，并利用逻辑回归分析每次调整产生的影响。我们的研究结果显示，不同LLMs在不道德行为的基准倾向上存在显著异质性。风险规避、利润预期和监管环境等因素，正如经济学理论所预测的那样，持续影响着模型的对齐偏差，尽管这些效应的大小因LLM而异。本文同时凸显了基于模拟的事后安全测试的益处与局限。虽然此类测试能为致力于确保LLM安全的金融监管机构与金融机构提供参考，但在普适性与成本之间存在着明显的权衡。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日