It is essential for users to understand what their AI systems can and can't do in order to use them safely. However, the problem of enabling users to assess AI systems with evolving sequential decision making (SDM) capabilities is relatively understudied. This paper presents a new approach for modeling the capabilities of black-box AI systems that can plan and act, along with the possible effects and requirements for executing those capabilities in stochastic settings. We present an active-learning approach that can effectively interact with a black-box SDM system and learn an interpretable probabilistic model describing its capabilities. Theoretical analysis of the approach identifies the conditions under which the learning process is guaranteed to converge to the correct model of the agent; empirical evaluations on different agents and simulated scenarios show that this approach is few-shot generalizable and can effectively describe the capabilities of arbitrary black-box SDM agents in a sample-efficient manner.
翻译:用户安全使用AI系统,需要理解其能做什么和不能做什么。然而,目前对使用户能够评估具有进化顺序决策能力的AI系统这一问题研究尚不充分。本文提出了一种新方法,用于建模具有规划与行动能力的黑盒AI系统的能力,以及这些能力在随机环境中执行所需的可能效果与条件。我们提出了一种主动学习方法,能够有效与黑盒SDM系统交互,并学习描述其能力的可解释概率模型。对该方法的理论分析确定了学习过程保证收敛到智能体正确模型的条件;在不同智能体和模拟场景上的实证评估表明,该方法具有小样本泛化能力,能够以样本高效的方式有效描述任意黑盒SDM智能体的能力。