Toward Third-Party Assurance of AI Systems: Design Requirements, Prototype, and Early Testing

As Artificial Intelligence (AI) systems proliferate, the need for systematic, transparent, and actionable processes for evaluating them is growing. While many resources exist to support AI evaluation, they have several limitations. Few address both the process of designing, developing, and deploying an AI system and the outcomes it produces. Furthermore, few are end-to-end and operational, give actionable guidance, or present evidence of usability or effectiveness in practice. In this paper, we introduce a third-party AI assurance framework that addresses these gaps. We focus on third-party assurance to prevent conflict of interest and ensure credibility and accountability of the process. We begin by distinguishing assurance from audits in several key dimensions. Then, following design principles, we reflect on the shortcomings of existing resources to identify a set of design requirements for AI assurance. We then construct a prototype of an assurance process that consists of (1) a responsibility assignment matrix to determine the different levels of involvement each stakeholder has at each stage of the AI lifecycle, (2) an interview protocol for each stakeholder of an AI system, (3) a maturity matrix to assess AI systems' adherence to best practices, and (4) a template for an assurance report that draws from more mature assurance practices in business accounting. We conduct early validation of our AI assurance framework by applying the framework to two distinct AI use cases -- a business document tagging tool for downstream processing in a large private firm, and a housing resource allocation tool in a public agency -- and conducting six expert validation interviews. Our findings show early evidence that our AI assurance framework is sound and comprehensive, usable across different organizational contexts, and effective at identifying bespoke issues with AI systems.

翻译：随着人工智能系统日益普及，对系统化、透明且可操作的评估流程的需求与日俱增。尽管已有众多支持AI评估的资源，但它们存在若干局限性：少有资源同时涵盖AI系统设计、开发与部署的流程及其产出结果；更缺乏端到端可落地的操作指南、具体实施建议或实践效用的实证依据。本文提出一种第三方AI保障框架以填补这些空白。我们聚焦第三方保障以防止利益冲突，确保流程的可信度与问责性。首先从多个关键维度区分保障与审计的概念差异。随后遵循设计原则，反思现有资源不足，提炼出AI保障的设计需求集。进而构建保障流程原型，包含：(1) 责任分配矩阵——确定AI生命周期各阶段利益相关者的参与程度；(2) 面向各利益相关方的访谈大纲；(3) 用于评估AI系统最佳实践符合度的成熟度矩阵；(4) 借鉴企业会计中成熟的保障实践形成的保障报告模板。我们通过将此框架应用于两个差异化AI用例（大型私营企业下游处理环节的文档标签工具，以及公共机构的住房资源分配工具），并开展六场专家验证访谈，进行了初步有效性验证。研究结果表明，该AI保障框架具有合理性与全面性，能够跨越不同组织情境应用，并可有效识别AI系统的特定问题。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

构建面向终端的 AI 编程智能体：脚手架、测试环境、上下文工程及实践经验

专知会员服务

25+阅读 · 3月8日

AI 智能体系统：体系架构、应用场景及评估范式

专知会员服务

70+阅读 · 1月6日

《防务领域人工智能可信赖性：为防务开发负责任、符合伦理且可信赖的AI系统》欧洲防务局2025最新107页

专知会员服务

23+阅读 · 2025年5月14日

国家标准《人工智能面向机器学习的系统规范（征求意见稿）》

专知会员服务

53+阅读 · 2024年5月25日