Scene simulation in autonomous driving has gained significant attention because of its huge potential for generating customized data. However, existing editable scene simulation approaches face limitations in terms of user interaction efficiency, multi-camera photo-realistic rendering and external digital assets integration. To address these challenges, this paper introduces ChatSim, the first system that enables editable photo-realistic 3D driving scene simulations via natural language commands with external digital assets. To enable editing with high command flexibility,~ChatSim leverages a large language model (LLM) agent collaboration framework. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field method. Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering. Our experiments on Waymo Open Dataset demonstrate that ChatSim can handle complex language commands and generate corresponding photo-realistic scene videos.
翻译:自动驾驶中的场景仿真因其在生成定制化数据方面的巨大潜力而受到广泛关注。然而,现有的可编辑场景仿真方法在用户交互效率、多相机逼真渲染以及外部数字资产集成方面存在局限。为应对这些挑战,本文提出了ChatSim,这是首个能够通过自然语言指令结合外部数字资产实现可编辑、逼真三维驾驶场景仿真的系统。为实现高指令灵活性的编辑,ChatSim采用了一种大型语言模型(LLM)智能体协作框架。为生成逼真的结果,ChatSim采用了一种新颖的多相机神经辐射场方法。此外,为释放大量高质量数字资产的潜力,ChatSim采用了一种新颖的多相机光照估计方法,以实现场景一致的资产渲染。我们在Waymo开放数据集上的实验表明,ChatSim能够处理复杂的语言指令并生成相应的逼真场景视频。