The emergence of large language models (LLMs) represents a major advance in artificial intelligence (AI) research. However, the widespread use of LLMs is also coupled with significant ethical and social challenges. Previous research has pointed towards auditing as a promising governance mechanism to help ensure that AI systems are designed and deployed in ways that are ethical, legal, and technically robust. However, existing auditing procedures fail to address the governance challenges posed by LLMs, which are adaptable to a wide range of downstream tasks. To help bridge that gap, we offer three contributions in this article. First, we establish the need to develop new auditing procedures that capture the risks posed by LLMs by analysing the affordances and constraints of existing auditing procedures. Second, we outline a blueprint to audit LLMs in feasible and effective ways by drawing on best practices from IT governance and system engineering. Specifically, we propose a three-layered approach, whereby governance audits, model audits, and application audits complement and inform each other. Finally, we discuss the limitations not only of our three-layered approach but also of the prospect of auditing LLMs at all. Ultimately, this article seeks to expand the methodological toolkit available to technology providers and policymakers who wish to analyse and evaluate LLMs from technical, ethical, and legal perspectives.
翻译:大型语言模型(LLM)的出现代表了人工智能(AI)研究的重大进展。然而,LLM的广泛使用也伴随着显著的伦理和社会挑战。以往的研究指出,审计是一种有前景的治理机制,有助于确保AI系统的设计和部署在伦理、法律和技术上具有鲁棒性。然而,现有的审计程序未能应对LLM所带来的治理挑战,因为这些模型可适应广泛的下游任务。为弥合这一差距,本文提供了三项贡献。首先,通过分析现有审计程序的赋能与约束,我们确立了开发能够捕捉LLM风险的新审计程序的必要性。其次,借鉴IT治理和系统工程的最佳实践,我们提出了一种可行且有效的审计LLM的蓝图。具体而言,我们提出了一种三层方法,其中治理审计、模型审计和应用审计相互补充、相互促进。最后,我们讨论了不仅是我们三层方法的局限性,还包括对LLM进行审计本身的前景限制。本文旨在扩展可供技术提供者和政策制定者使用的方法工具箱,使他们能够从技术、伦理和法律视角分析和评估LLM。