ScenicRules: An Autonomous Driving Benchmark with Multi-Objective Specifications and Abstract Scenarios

from arxiv, v2: Minor numerical corrections for Table V. 16 pages, 14 figures, 7 tables. Extended version of paper accepted to 2026 IEEE Intelligent Vehicles Symposium (IV 2026). ScenicRules benchmark available at https://github.com/BerkeleyLearnVerify/ScenicRules

Developing autonomous driving systems for complex traffic environments requires balancing multiple objectives, such as avoiding collisions, obeying traffic rules, and making efficient progress. In many situations, these objectives cannot be satisfied simultaneously, and explicit priority relations naturally arise. Also, driving rules require context, so it is important to formally model the environment scenarios within which such rules apply. Existing benchmarks for evaluating autonomous vehicles lack such combinations of multi-objective prioritized rules and formal environment models. In this work, we introduce ScenicRules, a benchmark for evaluating autonomous driving systems in stochastic environments under prioritized multi-objective specifications. We first formalize a diverse set of objectives to serve as quantitative evaluation metrics. Next, we design a Hierarchical Rulebook framework that encodes multiple objectives and their priority relations in an interpretable and adaptable manner. We then construct a compact yet representative collection of scenarios spanning diverse driving contexts and near-accident situations, formally modeled in the Scenic language. Experimental results show that our formalized objectives and Hierarchical Rulebooks align well with human driving judgments and that our benchmark effectively exposes agent failures with respect to the prioritized objectives. Our benchmark can be accessed at https://github.com/BerkeleyLearnVerify/ScenicRules/.

翻译：开发复杂交通环境下的自动驾驶系统需要平衡多个目标，例如避免碰撞、遵守交通规则以及实现高效通行。在许多场景中，这些目标无法同时满足，因此自然地产生了明确的优先级关系。同时，驾驶规则依赖于具体情境，因此对规则适用的环境场景进行形式化建模至关重要。现有的自动驾驶车辆评估基准缺乏此类结合多目标优先级规则与形式化环境模型的能力。本研究提出ScenicRules基准，用于在随机环境下基于带优先级的多目标规约评估自动驾驶系统。我们首先形式化定义一组多样化目标作为定量评估指标。其次，设计层次化规则手册框架，以可解释且可适配的方式编码多个目标及其优先级关系。随后构建一组紧凑且具代表性的场景集合，涵盖多样驾驶情境与近事故场景，并使用Scenic语言对其进行形式化建模。实验结果表明，我们形式化的目标与层次化规则手册与人类驾驶判断高度一致，且该基准能有效暴露智能体在优先级目标下的失效情况。本基准已开源，访问地址为https://github.com/BerkeleyLearnVerify/ScenicRules/。