RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

Human uplift studies - or studies that measure AI effects on human performance relative to a status quo, typically using randomized controlled trial (RCT) methodology - are increasingly used to inform deployment, governance, and safety decisions for frontier AI systems. While the methods underlying these studies are well-established, their interaction with the distinctive properties of frontier AI systems remains underexamined, particularly when results are used to inform high-stakes decisions. We present findings from interviews with 16 expert practitioners with experience conducting human uplift studies in domains including biosecurity, cybersecurity, education, and labor. Across interviews, experts described a recurring tension between standard causal inference assumptions and the object of study itself. Rapidly evolving AI systems, shifting baselines, heterogeneous and changing user proficiency, and porous real-world settings strain assumptions underlying internal, external, and construct validity, complicating the interpretation and appropriate use of uplift evidence. We synthesize these challenges across key stages of the human uplift research lifecycle and map them to practitioner-reported solutions, clarifying both the limits and the appropriate uses of evidence from human uplift studies in high-stakes decision-making.

翻译：人类提升研究——即通过随机对照试验（RCT）方法，测量AI相对于现状对人类表现的影响的研究——正越来越多地被用于指导前沿AI系统的部署、治理与安全决策。尽管这些研究采用的方法已较为成熟，但其与前沿AI系统特有属性之间的相互作用仍未得到充分探讨，尤其是在研究结果被用于影响高风险决策时。本研究基于对16位具有人类提升研究实践经验的专家（研究领域涵盖生物安全、网络安全、教育及劳动力市场）的访谈，总结了以下发现。访谈中，专家们普遍描述了一种标准因果推断假设与研究客体本身之间的持续张力。快速演进的AI系统、动态变化的基准线、用户能力的异质性与动态性，以及现实场景的渗透性，均对内部效度、外部效度与构念效度的基本假设构成压力，从而使得提升证据的解释与恰当运用复杂化。我们将这些挑战归纳至人类提升研究生命周期的关键阶段，并将其与从业者提出的解决方案相对应，从而阐明人类提升研究证据在高风险决策中的局限性及适用边界。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

前沿人工智能趋势报告（Frontier AI Trends Report）

专知会员服务

39+阅读 · 2025年12月20日

《基于人工智能与机器学习增强信息环境下作战决策能力》2025年最新95页

专知会员服务

41+阅读 · 2025年9月15日

《面向科学发现的智能体人工智能：进展、挑战与未来方向综述》

专知会员服务

60+阅读 · 2025年3月14日

【新书】ChatGPT的应用、挑战与未来，333页pdf

专知会员服务

53+阅读 · 2024年7月3日