Long-tail hazardous scenarios are essential for safety-oriented autonomous driving, yet they are difficult to collect and reproduce at scale. Editable 3D Gaussian Splatting (3DGS) simulation offers a promising alternative by reconstructing real driving scenes and supporting controllable scene editing. However, edited 3DGS-rendered videos still suffer from a significant Sim-to-Real gap, including rendering artifacts, degraded foreground assets, inconsistent illumination, and temporal flickering. Existing restoration and video generation methods are insufficient for this task, as they often fail to jointly repair 3DGS-specific artifacts, improve visual realism, and ensure temporal consistency. To fill this gap, we propose RealityBridge, a structure-preserving and asset-aware Sim-to-Real framework for edited 3DGS driving videos. RealityBridge uses multimodal controls, including rendered videos, foreground masks, edge maps, and semantic masks, together with a lightweight GateNet for adaptive condition allocation across backbone layers. We further construct targeted training data and introduce autoregressive long-video training with reward-guided post-training to improve restoration quality, temporal stability, and hallucination suppression. Extensive experiments on internal and public driving datasets show that RealityBridge outperforms existing methods in artifact removal, illumination harmonization, and long-sequence temporal consistency.
翻译:长尾危险场景对于面向安全的自动驾驶至关重要,但难以大规模采集和复现。可编辑三维高斯泼溅(3DGS)仿真通过重建真实驾驶场景并支持可控场景编辑,提供了一种有前景的替代方案。然而,经编辑的3DGS渲染视频仍存在显著的仿真到现实域差距,包括渲染伪影、前景资产退化、光照不一致以及时序闪烁。现有的复原和视频生成方法难以胜任此任务,因为它们往往无法同时修复3DGS特有的伪影、提升视觉真实感并确保时序一致性。为填补这一空白,我们提出RealityBridge,一个面向编辑后3DGS驾驶视频的、保留结构且感知资产的仿真到现实框架。RealityBridge利用多模态控制信号,包括渲染视频、前景掩码、边缘图和语义掩码,并结合轻量级GateNet以实现跨主干层的自适应条件分配。我们进一步构建了针对性训练数据,并引入自回归长视频训练以及奖励引导的后训练,以提升复原质量、时序稳定性并抑制幻觉。在内部和公开驾驶数据集上的大量实验表明,RealityBridge在伪影去除、光照协调和长序列时序一致性方面均优于现有方法。