Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards

Artificial Intelligence (AI) is taking on increasingly autonomous roles, e.g., browsing the web as a research assistant and managing money. But specifying goals and restrictions for AI behavior is difficult. Similar to how parties to a legal contract cannot foresee every potential "if-then" contingency of their future relationship, we cannot specify desired AI behavior for all circumstances. Legal standards facilitate robust communication of inherently vague and underspecified goals. Instructions (in the case of language models, "prompts") that employ legal standards will allow AI agents to develop shared understandings of the spirit of a directive that generalize expectations regarding acceptable actions to take in unspecified states of the world. Standards have built-in context that is lacking from other goal specification languages, such as plain language and programming languages. Through an empirical study on thousands of evaluation labels we constructed from U.S. court opinions, we demonstrate that large language models (LLMs) are beginning to exhibit an "understanding" of one of the most relevant legal standards for AI agents: fiduciary obligations. Performance comparisons across models suggest that, as LLMs continue to exhibit improved core capabilities, their legal standards understanding will also continue to improve. OpenAI's latest LLM has 78% accuracy on our data, their previous release has 73% accuracy, and a model from their 2020 GPT-3 paper has 27% accuracy (worse than random). Our research is an initial step toward a framework for evaluating AI understanding of legal standards more broadly, and for conducting reinforcement learning with legal feedback (RLLF).

翻译：人工智能正承担着日益自主的角色，例如以研究助手身份浏览网页以及管理资金。然而，为人工智能行为设定目标和限制却颇为困难。如同法律合同的双方无法预见未来关系中所有潜在的"如果-那么"偶发情况一样，我们也无法在所有情境下指定期望的人工智能行为。法律标准有助于实现固有模糊且不明确目标的稳健通信。采用法律标准的指令（在语言模型中即"提示词"）将使AI智能体能够对指令的精神实质形成共识，从而将关于可接受行为的期望泛化到未明确规定的世界状态中。与其他目标规范语言（如自然语言和编程语言）相比，标准具备其特有的语境要素。通过对我们从美国法院判决中构建的数千条评估标签进行实证研究，我们证明了大型语言模型已开始展现对AI智能体最相关法律标准之一——受托义务的"理解"。跨模型的性能比较表明，随着大型语言模型核心能力的持续提升，其法律标准理解能力也将同步增强。OpenAI最新版LLM在我们的数据集上准确率达78%，其上一版本为73%，而2020年GPT-3论文中的模型准确率仅为27%（低于随机水平）。本研究是迈向更广泛评估AI法律标准理解框架、以及开展基于法律反馈的强化学习（RLLF）的初步探索。