Autonomous vehicles (AVs) make driving decisions without humans, making dependability assurance critical. Scenario-based testing is widely used to evaluate AVs under diverse conditions, with reinforcement learning (RL) generating test scenarios that identify violations of functional and safety requirements. Many requirements are interdependent and involve trade-offs, making it unclear whether single-objective RL (SORL), which combines objectives into a single reward, can reliably reveal violations or whether multi-objective RL (MORL), which explicitly considers multiple objectives, is necessary. We present an empirical evaluation comparing SORL and MORL for generating critical scenarios that simultaneously test interdependent requirements using an end-to-end AV controller and high-fidelity simulator. Results suggest that MORL and SORL differ mainly in how violations occur, while showing comparable effectiveness in many cases. MORL tends to generate more requirement-violation scenarios, whereas SORL produces higher-severity violations. Their relative performance also depends on specific objective combinations and, to a lesser extent, road conditions. Regarding diversity, MORL consistently covers a broader range of scenarios. Thus, MORL is preferable when scenario diversity and coverage are prioritized, whereas SORL may better expose severe violations. Our empirical evaluation addresses a gap by systematically comparing SORL and MORL, highlighting the importance of requirement dependencies in RL-based AV testing.
翻译:自动驾驶车辆(AVs)在不依赖人类的情况下做出驾驶决策,因此其可靠性保障至关重要。基于场景的测试广泛用于评估AVs在不同条件下的表现,而强化学习(RL)可生成能识别功能与安全需求违规行为的测试场景。许多需求相互依赖且涉及权衡,这引发了疑问:将多目标合并为单一奖励的单目标强化学习(SORL)能否可靠地揭示违规行为,还是需要显式考虑多目标的多目标强化学习(MORL)?我们通过端到端AV控制器与高保真仿真器,对SORL与MORL在生成同时测试相互依赖需求的关键场景方面的效能进行了实证比较。结果表明,MORL与SORL的主要差异在于违规行为的发生方式,而在许多情况下两者有效性相当。MORL倾向于生成更多需求违规场景,而SORL则产生更高严重程度的违规行为。两者的相对性能还取决于具体目标组合,并在较小程度上受道路条件影响。在多样性方面,MORL始终覆盖更广泛的场景范围。因此,当优先考虑场景多样性和覆盖率时,MORL更优,而SORL可能更擅长暴露严重违规行为。本实证研究通过系统比较SORL与MORL,填补了相关空白,突显了基于RL的AV测试中需求依赖关系的重要性。