Uncovering potential failure cases is a crucial step in the validation of safety critical systems such as autonomous vehicles. Failure search may be done through logging substantial vehicle miles in either simulation or real world testing. Due to the sparsity of failure events, naive random search approaches require significant amounts of vehicle operation hours to find potential system weaknesses. As a result, adaptive searching techniques have been proposed to efficiently explore and uncover failure trajectories of an autonomous policy in simulation. Adaptive Stress Testing (AST) is one such method that poses the problem of failure search as a Markov decision process and uses reinforcement learning techniques to find high probability failures. However, this formulation requires a probability model for the actions of all agents in the environment. In systems where the environment actions are discrete and dependencies among agents exist, it may be infeasible to fully characterize the distribution or find a suitable proxy. This work proposes the use of a data driven approach to learn a suitable classifier that tries to model how humans identify {critical states and use this to guide failure search in AST. We show that the incorporation of critical states into the AST framework generates failure scenarios with increased safety violations in an autonomous driving policy with a discrete action space.
翻译:发现潜在故障案例是验证安全关键系统(如自动驾驶车辆)的关键步骤。故障搜索可通过记录大量车辆行驶里程(无论是仿真环境还是现实世界测试)来实现。由于故障事件的稀疏性,朴素随机搜索方法需要耗费大量车辆运行时间来发现系统的潜在弱点。因此,研究人员提出了自适应搜索技术,以高效探索并发现仿真环境中自动驾驶策略的故障轨迹。自适应压力测试(AST)是一种将故障搜索问题建模为马尔可夫决策过程的方法,利用强化学习技术来寻找高概率故障。然而,该公式需要对环境中所有智能体的动作建立概率模型。在环境动作为离散且智能体之间存在依赖关系的系统中,完整表征分布或寻找合适的代理模型可能不可行。本研究提出采用数据驱动方法,学习一个能够模拟人类如何识别关键状态的分类器,并以此指导AST中的故障搜索。实验表明,将关键状态纳入AST框架后,在离散动作空间的自动驾驶策略中可生成更多违反安全规范的故障场景。