Virtual testing using synthetic data has become a cornerstone of autonomous vehicle (AV) safety assurance. Despite progress in improving visual realism through advanced simulators and generative AI, recent studies reveal that pixel-level fidelity alone does not ensure reliable transfer from simulation to the real world. What truly matters is whether the system-under-test (SUT) bases its decisions on consistent decision evidence in both real and simulated environments, not just whether images "look real" to humans. To this end this paper proposes a behavior-grounded fidelity measure by introducing Decisive Feature Fidelity (DFF), a new SUT-specific metric that extends the existing fidelity spectrum to capture mechanism parity, that is, agreement in the model-specific decisive evidence that drives the SUT's decisions across domains. DFF leverages explainable-AI methods to identify and compare the decisive features driving the SUT's outputs for matched real-synthetic pairs. We further propose estimators based on counterfactual explanations, along with a DFF-guided calibration scheme to enhance simulator fidelity. Experiments on 2126 matched KITTI-VirtualKITTI2 pairs demonstrate that DFF reveals discrepancies overlooked by conventional output-value fidelity. Furthermore, results show that DFF-guided calibration improves decisive-feature and input-level fidelity without sacrificing output value fidelity across diverse SUTs.
翻译:使用合成数据进行虚拟测试已成为自动驾驶汽车(AV)安全性验证的基石。尽管通过先进模拟器和生成式人工智能在提升视觉真实感方面取得了进展,但近期研究表明,仅靠像素级保真度并不能确保从仿真环境到真实世界的可靠迁移。关键在于被测系统(SUT)是否基于真实与仿真环境中一致的决策依据做出判断,而不仅仅是图像对人类而言是否“看起来真实”。为此,本文提出一种基于行为的保真度度量方法,引入决定性特征保真度(DFF)——这是一种新型的SUT特异性指标,它将现有保真度度量体系扩展至机制对等性层面,即捕获驱动SUT跨域决策的模型特异性决定性证据的一致性。DFF利用可解释人工智能方法,识别并比较驱动SUT在匹配的真实-合成图像对上产生输出的决定性特征。我们进一步提出基于反事实解释的估计器,以及DFF引导的校准方案以提升模拟器保真度。在2126对匹配的KITTI-VirtualKITTI2数据对上进行的实验表明,DFF能够揭示传统输出值保真度所忽略的差异。此外,实验结果证明DFF引导的校准能在不牺牲各类SUT输出值保真度的前提下,有效提升决定性特征级与输入级保真度。