STADA: Specification-based Testing for Autonomous Driving Agents

Simulation-based testing has become a standard approach to validating autonomous driving agents prior to real-world deployment. A high-quality validation campaign will exercise an agent in diverse contexts comprised of varying static environments, e.g., lanes, intersections, signage, and dynamic elements, e.g., vehicles and pedestrians. To achieve this, existing test generation techniques rely on template-based, manually constructed, or random scenario generation. When applied to validate formally specified safety requirements, such methods either require significant human effort or run the risk of missing important behavior related to the requirement. To address this gap, we present STADA, a Specification-based Test generation framework for Autonomous Driving Agents that systematically generates the space of scenarios defined by a formal specification expressed in temporal logic (LTLf). Given a specification, STADA constructs all distinct initial scenes, a diverse space of continuations of those scenes, and simulations that reflect the behaviors of the specification. Evaluation of STADA on a variety of LTLf specifications formalized in SCENEFLOW using three complementary coverage criteria demonstrates that STADA yields more than 2x higher coverage than the best baseline on the finest criteria and a 75% increase for the coarsest criteria. Moreover, it matches the coverage of the best baseline with 6 times fewer simulations. While set in the context of autonomous driving, the approach is applicable to other domains with rich simulation environments.

翻译：基于仿真的测试已成为自动驾驶智能体在真实世界部署前的标准验证方法。高质量的验证活动需使智能体在多样化的场景中运行，这些场景包含变化的静态环境（如车道、交叉路口、交通标志）和动态元素（如车辆与行人）。为实现这一目标，现有的测试生成技术依赖于基于模板、人工构建或随机场景生成的方法。当应用于验证形式化规约的安全需求时，此类方法要么需要大量人力投入，要么可能遗漏与需求相关的重要行为。为弥补这一不足，我们提出STADA——一个基于规约的自动驾驶智能体测试生成框架，该系统性地生成由时序逻辑（LTLf）表达的形式化规约所定义的场景空间。给定一个规约，STADA构建所有不同的初始场景、这些场景的多样化后续发展空间，以及反映规约行为的仿真序列。通过在SCENEFLOW中形式化的多种LTLf规约上使用三种互补的覆盖准则对STADA进行评估，结果表明：在最精细的准则下，STADA的覆盖率达到最佳基线的2倍以上；在最粗略的准则下，覆盖率提升75%。此外，其仅需六分之一的仿真量即可达到最佳基线的覆盖水平。虽然本研究以自动驾驶为背景，但该方法同样适用于其他具有丰富仿真环境的领域。