A particularly challenging problem in AI safety is providing guarantees on the behavior of high-dimensional autonomous systems. Verification approaches centered around reachability analysis fail to scale, and purely statistical approaches are constrained by the distributional assumptions about the sampling process. Instead, we pose a distributionally robust version of the statistical verification problem for black-box systems, where our performance guarantees hold over a large family of distributions. This paper proposes a novel approach based on a combination of active learning, uncertainty quantification, and neural network verification. A central piece of our approach is an ensemble technique called Imprecise Neural Networks, which provides the uncertainty to guide active learning. The active learning uses an exhaustive neural-network verification tool Sherlock to collect samples. An evaluation on multiple physical simulators in the openAI gym Mujoco environments with reinforcement-learned controllers demonstrates that our approach can provide useful and scalable guarantees for high-dimensional systems.
翻译:人工智能安全中的一个特别具有挑战性的问题是为高维自主系统的行为提供保证。基于可达性分析的验证方法无法扩展,而纯粹统计方法则受限于关于采样过程的分布假设。为此,我们针对黑箱系统提出了一个分布鲁棒版本的统计验证问题,其中我们的性能保证适用于一大类分布。本文提出了一种基于主动学习、不确定性量化和神经网络验证相结合的新方法。我们方法的核心是一种称为"不精确神经网络"的集成技术,该技术提供不确定性以指导主动学习。主动学习利用穷举式神经网络验证工具Sherlock收集样本。在带有强化学习控制器的openAI gym Mujoco环境中的多个物理模拟器上的评估表明,我们的方法能够为高维系统提供有用且可扩展的保证。