Deep Reinforcement Learning (DRL) has received a lot of attention from the research community in recent years. As the technology moves away from game playing to practical contexts, such as autonomous vehicles and robotics, it is crucial to evaluate the quality of DRL agents. In this paper, we propose a search-based approach to test such agents. Our approach, implemented in a tool called Indago, trains a classifier on failure and non-failure environment configurations resulting from the DRL training process. The classifier is used at testing time as a surrogate model for the DRL agent execution in the environment, predicting the extent to which a given environment configuration induces a failure of the DRL agent under test. Indeed, the failure prediction acts as a fitness function, in order to guide the generation towards failure environment configurations, while saving computation time by deferring the execution of the DRL agent in the environment to those configurations that are more likely to expose failures. Experimental results show that our search-based approach finds 50% more failures of the DRL agent than state-of-the-art techniques. Moreover, such failure environment configurations, as well as the behaviours of the DRL agent induced by them, are significantly more diverse.
翻译:深度强化学习(DRL)近年来受到研究界的广泛关注。随着该技术从游戏领域向自主驾驶和机器人等实际场景迁移,评估DRL代理的质量变得至关重要。本文提出了一种基于搜索的测试方法。该方法的实现工具名为Indago,通过对DRL训练过程中产生的失败与成功环境配置训练分类器。该分类器在测试阶段作为DRL代理环境执行的替代模型,预测给定环境配置导致被测DRL代理失效的程度。失效预测作为适应度函数,引导生成过程朝向失效环境配置,同时通过推迟DRL代理在环境中的执行(仅针对更可能暴露失效的配置)节省计算时间。实验结果表明,我们的搜索方法相比现有技术可多发现50%的DRL代理失效。此外,这些失效环境配置及其引发的DRL代理行为具有显著更高的多样性。