Deep Reinforcement Learning (DRL) has received a lot of attention from the research community in recent years. As the technology moves away from game playing to practical contexts, such as autonomous vehicles and robotics, it is crucial to evaluate the quality of DRL agents. In this paper, we propose a search-based approach to test such agents. Our approach, implemented in a tool called Indago, trains a classifier on failure and non-failure environment (i.e., pass) configurations resulting from the DRL training process. The classifier is used at testing time as a surrogate model for the DRL agent execution in the environment, predicting the extent to which a given environment configuration induces a failure of the DRL agent under test. The failure prediction acts as a fitness function, guiding the generation towards failure environment configurations, while saving computation time by deferring the execution of the DRL agent in the environment to those configurations that are more likely to expose failures. Experimental results show that our search-based approach finds 50% more failures of the DRL agent than state-of-the-art techniques. Moreover, such failures are, on average, 78% more diverse; similarly, the behaviors of the DRL agent induced by failure configurations are 74% more diverse.
翻译:深度强化学习(DRL)近年来受到了研究社区的广泛关注。随着该技术从游戏领域转向实际应用场景(如自动驾驶汽车和机器人),评估DRL智能体的质量变得至关重要。本文提出了一种基于搜索的方法来测试此类智能体。我们的方法通过名为Indago的工具实现,利用DRL训练过程中产生的故障与非故障环境(即通过)配置训练分类器。该分类器在测试阶段作为DRL智能体在环境中执行的替代模型,用于预测给定环境配置导致被测DRL智能体出现故障的程度。故障预测作为适应度函数,引导生成过程朝向故障环境配置,同时通过将DRL智能体在环境中的执行推迟到更可能暴露故障的配置,从而节省计算时间。实验结果表明,我们的基于搜索的方法比现有先进技术多发现50%的DRL智能体故障。此外,这些故障的平均多样性高出78%;类似地,由故障配置引发的DRL智能体行为多样性也高出74%。