Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic dataset curated from visually complex AAA games. Using a novel dual-screen stitched capture method, we extracted 4M continuous frames (720p/30 FPS) of synchronized RGB and five G-buffer channels across diverse scenes, visual effects, and environments, including adverse weather and motion-blur variants. This dataset uniquely advances bidirectional rendering: enabling robust in-the-wild geometry and material decomposition, and facilitating high-fidelity G-buffer-guided video generation. Furthermore, to evaluate the real-world performance of inverse rendering without ground truth, we propose a novel VLM-based assessment protocol measuring semantic, spatial, and temporal consistency. Experiments demonstrate that inverse renderers fine-tuned on our data achieve superior cross-dataset generalization and controllable generation, while our VLM evaluation strongly correlates with human judgment. Combined with our toolkit, our forward renderer enables users to edit styles of AAA games from G-buffers using text prompts.
翻译:将生成式逆渲染和前向渲染扩展到现实世界场景,其瓶颈在于现有合成数据集的真实感和时间一致性有限。为弥合这一持续存在的领域差异,我们引入了一个从视觉复杂的AAA级游戏中整理的大规模动态数据集。采用一种新颖的双屏拼接采集方法,我们从不同场景、视觉效果与环境(包括恶劣天气和运动模糊变体)中提取了400万帧连续画面(720p/30 FPS)的同步RGB和五个G-buffer通道。该数据集独特地推动了双向渲染的发展:既能实现鲁棒的野外几何体与材质解构,又能促进高保真度的G-buffer引导视频生成。此外,为评估无真实标签情况下逆渲染的现实世界性能,我们提出了一种基于VLM的新型评估协议,用于衡量语义一致性、空间一致性和时间一致性。实验表明,基于我们的数据微调的逆渲染模型实现了更优的跨数据集泛化能力与可控生成效果,同时我们的VLM评估结果与人类判断高度相关。结合我们的工具包,前向渲染器使用户能够通过文本提示,基于G-buffer编辑AAA级游戏的风格。