As Artificial Intelligence (AI) systems proliferate, the need for systematic, transparent, and actionable processes for evaluating them is growing. While many resources exist to support AI evaluation, they have several limitations. Few address both the process of designing, developing, and deploying an AI system and the outcomes it produces. Furthermore, few are end-to-end and operational, give actionable guidance, or present evidence of usability or effectiveness in practice. In this paper, we introduce a third-party AI assurance framework that addresses these gaps. We focus on third-party assurance to prevent conflict of interest and ensure credibility and accountability of the process. We begin by distinguishing assurance from audits in several key dimensions. Then, following design principles, we reflect on the shortcomings of existing resources to identify a set of design requirements for AI assurance. We then construct a prototype of an assurance process that consists of (1) a responsibility assignment matrix to determine the different levels of involvement each stakeholder has at each stage of the AI lifecycle, (2) an interview protocol for each stakeholder of an AI system, (3) a maturity matrix to assess AI systems' adherence to best practices, and (4) a template for an assurance report that draws from more mature assurance practices in business accounting. We conduct early validation of our AI assurance framework by applying the framework to two distinct AI use cases -- a business document tagging tool for downstream processing in a large private firm, and a housing resource allocation tool in a public agency -- and conducting six expert validation interviews. Our findings show early evidence that our AI assurance framework is sound and comprehensive, usable across different organizational contexts, and effective at identifying bespoke issues with AI systems.
翻译:随着人工智能系统日益普及,对系统化、透明且可操作的评估流程的需求与日俱增。尽管已有众多支持AI评估的资源,但它们存在若干局限性:少有资源同时涵盖AI系统设计、开发与部署的流程及其产出结果;更缺乏端到端可落地的操作指南、具体实施建议或实践效用的实证依据。本文提出一种第三方AI保障框架以填补这些空白。我们聚焦第三方保障以防止利益冲突,确保流程的可信度与问责性。首先从多个关键维度区分保障与审计的概念差异。随后遵循设计原则,反思现有资源不足,提炼出AI保障的设计需求集。进而构建保障流程原型,包含:(1) 责任分配矩阵——确定AI生命周期各阶段利益相关者的参与程度;(2) 面向各利益相关方的访谈大纲;(3) 用于评估AI系统最佳实践符合度的成熟度矩阵;(4) 借鉴企业会计中成熟的保障实践形成的保障报告模板。我们通过将此框架应用于两个差异化AI用例(大型私营企业下游处理环节的文档标签工具,以及公共机构的住房资源分配工具),并开展六场专家验证访谈,进行了初步有效性验证。研究结果表明,该AI保障框架具有合理性与全面性,能够跨越不同组织情境应用,并可有效识别AI系统的特定问题。