ScenicRules：一个具有多目标规范与抽象场景的自动驾驶基准 (ScenicRules: An Autonomous Driving Benchmark with Multi-Objective Specifications and Abstract Scenarios)

from arxiv, 16 pages, 14 figures, 7 tables. Extended version of paper accepted to 2026 IEEE Intelligent Vehicles Symposium (IV 2026). ScenicRules benchmark available at https://github.com/BerkeleyLearnVerify/ScenicRules

Developing autonomous driving systems for complex traffic environments requires balancing multiple objectives, such as avoiding collisions, obeying traffic rules, and making efficient progress. In many situations, these objectives cannot be satisfied simultaneously, and explicit priority relations naturally arise. Also, driving rules require context, so it is important to formally model the environment scenarios within which such rules apply. Existing benchmarks for evaluating autonomous vehicles lack such combinations of multi-objective prioritized rules and formal environment models. In this work, we introduce ScenicRules, a benchmark for evaluating autonomous driving systems in stochastic environments under prioritized multi-objective specifications. We first formalize a diverse set of objectives to serve as quantitative evaluation metrics. Next, we design a Hierarchical Rulebook framework that encodes multiple objectives and their priority relations in an interpretable and adaptable manner. We then construct a compact yet representative collection of scenarios spanning diverse driving contexts and near-accident situations, formally modeled in the Scenic language. Experimental results show that our formalized objectives and Hierarchical Rulebooks align well with human driving judgments and that our benchmark effectively exposes agent failures with respect to the prioritized objectives. Our benchmark can be accessed at https://github.com/BerkeleyLearnVerify/ScenicRules/.

翻译：为复杂交通环境开发自动驾驶系统需要平衡多个目标，例如避免碰撞、遵守交通规则以及实现高效通行。在许多情况下，这些目标无法同时满足，明确的优先级关系自然产生。此外，驾驶规则需要上下文，因此对规则适用的环境场景进行形式化建模至关重要。现有的自动驾驶评估基准缺乏这种多目标优先级规则与形式化环境模型的结合。在本工作中，我们提出了ScenicRules，这是一个用于在随机环境下评估自动驾驶系统在优先级多目标规范下性能的基准。我们首先形式化了一组多样化的目标，作为定量评估指标。接着，我们设计了一个层次化规则手册框架，以可解释和可适配的方式编码多个目标及其优先级关系。然后，我们构建了一个紧凑但具有代表性的场景集合，涵盖不同的驾驶上下文和接近事故的情况，并使用Scenic语言对这些场景进行了形式化建模。实验结果表明，我们形式化的目标和层次化规则手册与人类驾驶判断高度一致，并且我们的基准能够有效地暴露智能体在优先级目标方面的失败。我们的基准可通过 https://github.com/BerkeleyLearnVerify/ScenicRules/ 访问。