The AI Evaluability Gap: The Missing Layer for Managing Risk and Sustaining Value

from arxiv, 24 pages, 9 figures. Conceptual framework paper introducing the AI Evaluability Gap, Evaluability as evidence sufficiency for governance decisions, Operational Certification, Investment Certification, and a six-property evidence lifecycle for AI governance

Organizations deploying AI face two fundamental governance challenges: managing AI risk and sustaining AI value. Both depend on evidence whose sufficiency cannot be taken for granted. We call the shared underlying challenge the AI Evaluability Gap: the condition in which organizations lack sufficient evidence to support high-confidence governance decisions regarding either risk or value. We argue that this gap reflects a category error in current practice. Existing governance approaches focus primarily on properties of systems, such as safety, fairness, reliability, compliance, and value, while paying comparatively little attention to the evidentiary foundations required to justify decisions about those properties. We further argue that AI governance encompasses both operational decisions regarding whether a system may operate and investment decisions regarding whether it merits continued organizational resources. To address this problem, we introduce Evaluability, defined as the capability of a system to generate, maintain, and renew evidence sufficient to support high-confidence governance decisions over time. We formalize governance decisions as functions of calibrated confidence Conf(D|E) and identify six properties of evaluable evidence: observability, attributability, intervenability, verifiability, calibration, and temporal validity. The framework distinguishes Operational Certification, which relies primarily on structural evidence to justify deployment decisions, from Investment Certification, which relies primarily on causal evidence to justify continued resource allocation. We argue that evidence sufficiency is a missing layer of AI governance and that closing the AI Evaluability Gap is a prerequisite for both managing risk and sustaining value in AI-enabled organizations.

翻译：暂无翻译

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《人工智能使能系统可靠性框架》

专知会员服务

20+阅读 · 4月27日

AI 智能体系统：体系架构、应用场景及评估范式

专知会员服务

70+阅读 · 1月6日

《GPT 的困境：基础模型与双重用途的阴影——在人工智能时代驾驭民用与军用应用的模糊地带》最新30页论文

专知会员服务

35+阅读 · 2024年8月16日

新加坡-生成式AI的治理框架模型，23页pdf

专知会员服务

59+阅读 · 2024年2月4日