Virtual testing using synthetic data has become a cornerstone of autonomous vehicle (AV) safety assurance. Despite progress in improving visual realism through advanced simulators and generative AI, recent studies reveal that pixel-level fidelity alone does not ensure reliable transfer from simulation to the real world. What truly matters is whether the system-under-test (SUT) bases its decisions on consistent decision evidence in both real and simulated environments, not just whether images "look real" to humans. To this end this paper proposes a behavior-grounded fidelity measure by introducing Decisive Feature Fidelity (DFF), a new SUT-specific metric that extends the existing fidelity spectrum to capture mechanism parity, that is, agreement in the model-specific decisive evidence that drives the SUT's decisions across domains. DFF leverages explainable-AI methods to identify and compare the decisive features driving the SUT's outputs for matched real-synthetic pairs. We further propose estimators based on counterfactual explanations, along with a DFF-guided calibration scheme to enhance simulator fidelity. Experiments on 2126 matched KITTI-VirtualKITTI2 pairs demonstrate that DFF reveals discrepancies overlooked by conventional output-value fidelity. Furthermore, results show that DFF-guided calibration improves decisive-feature and input-level fidelity without sacrificing output value fidelity across diverse SUTs.
翻译:使用合成数据进行虚拟测试已成为自动驾驶汽车(AV)安全验证的基石。尽管通过先进模拟器和生成式人工智能在提升视觉真实感方面取得了进展,但近期研究表明,仅凭像素级保真度并不能确保从仿真到现实世界的可靠迁移。关键在于被测系统(SUT)是否基于真实与仿真环境中一致的决策证据做出判断,而不仅仅是图像对人类而言是否“看起来真实”。为此,本文提出一种基于行为的保真度度量方法——引入决定性特征保真度(DFF)。这是一种新型的SUT特异性指标,它扩展了现有的保真度范畴,以捕捉机制对等性,即驱动SUT跨域决策的模型特异性决定性证据的一致性。DFF利用可解释人工智能方法,识别并比较驱动SUT对匹配的真实-合成图像对产生输出的决定性特征。我们进一步提出了基于反事实解释的估计器,以及一种DFF引导的校准方案,以提升模拟器的保真度。在2126对匹配的KITTI-VirtualKITTI2数据对上进行的实验表明,DFF能够揭示传统输出值保真度所忽略的差异。此外,结果显示,DFF引导的校准在多种SUT上均能提升决定性特征保真度与输入级保真度,且不牺牲输出值保真度。