When deployed in the real world, AI agents will inevitably face challenges that exceed their individual capabilities. A critical component of AI safety is an agent's ability to recognize when it is likely to fail in a novel situation and to yield control to a more capable expert system. Leveraging such expert assistance can significantly improve safety and performance in such situations. Since expert assistance is costly, a central challenge is determining when to consult an expert. In this paper, we explore a novel variant of this problem, termed YRC-0, in which an agent must learn to collaborate with an expert in new environments in an unsupervised manner--that is, without interacting with the expert during training. This setting motivates the development of low-cost, robust approaches for training expert-leveraging agents. To support research in this area, we introduce YRC-Bench, an open-source benchmark that instantiates YRC-0 across diverse environments. YRC-Bench provides a standardized Gym-like API, simulated experts, an evaluation pipeline, and implementations of popular baselines. Toward tackling YRC-0, we propose a validation strategy and use a proposer-validator decomposition as a diagnostic framework to evaluate a range of learning methods, offering insights that can inform future research. Codebase: https://github.com/modanesh/YRC-Bench
翻译:当人工智能智能体在现实世界中部署时,不可避免地会遇到超出其个体能力的挑战。人工智能安全的一个关键组成部分是智能体识别自身在何种新情境下可能失败,并将控制权移交给能力更强的专家系统的能力。利用此类专家协助可以显著提升此类情境下的安全性和性能。由于专家协助成本高昂,一个核心挑战在于确定何时咨询专家。本文探讨了该问题的一个新颖变体,称为YRC-0,其中智能体必须在无监督方式下——即在训练期间不与专家交互——学习在新环境中与专家协作。这一设定推动了开发低成本、鲁棒的利用专家智能体训练方法。为了支持该领域的研究,我们引入了YRC-Bench,这是一个开源基准,在多样化环境中实例化了YRC-0。YRC-Bench提供了标准化的类Gym API、模拟专家、评估流水线以及流行基线的实现。针对YRC-0问题,我们提出了一种验证策略,并使用提议者-验证者分解作为诊断框架来评估一系列学习方法,提供了可为未来研究提供参考的见解。代码库:https://github.com/modanesh/YRC-Bench