Advancements in large language models (LLMs) have renewed concerns about AI alignment - the consistency between human and AI goals and values. As various jurisdictions enact legislation on AI safety, the concept of alignment must be defined and measured across different domains. This paper proposes an experimental framework to assess whether LLMs adhere to ethical and legal standards in the relatively unexplored context of finance. We prompt nine LLMs to impersonate the CEO of a financial institution and test their willingness to misuse customer assets to repay outstanding corporate debt. Beginning with a baseline configuration, we adjust preferences, incentives and constraints, analyzing the impact of each adjustment with logistic regression. Our findings reveal significant heterogeneity in the baseline propensity for unethical behavior of LLMs. Factors such as risk aversion, profit expectations, and regulatory environment consistently influence misalignment in ways predicted by economic theory, although the magnitude of these effects varies across LLMs. This paper highlights both the benefits and limitations of simulation-based, ex post safety testing. While it can inform financial authorities and institutions aiming to ensure LLM safety, there is a clear trade-off between generality and cost.
翻译:大型语言模型(LLM)的进展重新引发了人们对人工智能对齐性——即人类与人工智能目标及价值观之间一致性——的担忧。随着各司法管辖区陆续颁布人工智能安全法规,对齐性的概念必须在不同领域内得到定义和衡量。本文提出了一个实验框架,用于评估LLM在相对未被充分探索的金融背景下是否遵守道德和法律标准。我们提示九个LLM模拟一家金融机构的首席执行官,并测试其是否愿意挪用客户资产来偿还未清偿的公司债务。我们从基准配置开始,调整偏好、激励和约束条件,并使用逻辑回归分析每次调整的影响。我们的研究结果揭示了LLM在基准状态下从事不道德行为的倾向存在显著的异质性。风险规避、利润预期和监管环境等因素,以经济理论所预测的方式持续影响模型的对齐偏差,尽管这些效应的大小因LLM而异。本文强调了基于模拟的事后安全测试的益处与局限性。虽然它可以为旨在确保LLM安全的金融监管机构和机构提供参考,但在普适性与成本之间存在明显的权衡。