Scene simulation in autonomous driving has gained significant attention because of its huge potential for generating customized data. However, existing editable scene simulation approaches face limitations in terms of user interaction efficiency, multi-camera photo-realistic rendering and external digital assets integration. To address these challenges, this paper introduces ChatSim, the first system that enables editable photo-realistic 3D driving scene simulations via natural language commands with external digital assets. To enable editing with high command flexibility,~ChatSim leverages a large language model (LLM) agent collaboration framework. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field method. Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering. Our experiments on Waymo Open Dataset demonstrate that ChatSim can handle complex language commands and generate corresponding photo-realistic scene videos.
翻译:自动驾驶场景仿真因其在生成定制化数据方面的巨大潜力而备受关注。然而,现有可编辑场景仿真方法在用户交互效率、多摄像头照片级真实感渲染及外部数字资产集成方面仍存在局限性。为解决上述挑战,本文提出ChatSim——首个支持通过自然语言指令结合外部数字资产实现可编辑照片级真实感3D驾驶场景仿真的系统。为实现高指令灵活性的编辑功能,ChatSim采用大语言模型(LLM)智能体协同框架。为生成照片级真实感输出,ChatSim创新性地提出多摄像头神经辐射场方法。此外,为充分发挥海量高质量数字资产的潜力,ChatSim设计了一种新型多摄像头光照估计算法,实现场景一致性数字资产渲染。在Waymo Open数据集上的实验表明,ChatSim能够处理复杂语言指令并生成对应的照片级真实感场景视频。