Understanding dynamic 3D scenes is crucial for extended reality (XR) and autonomous driving. Incorporating semantic information into 3D reconstruction enables holistic scene representations, unlocking immersive and interactive applications. To this end, we introduce TRASE, a novel tracking-free 4D segmentation method for dynamic scene understanding. TRASE learns a 4D segmentation feature field in a weakly-supervised manner, leveraging a soft-mined contrastive learning objective guided by SAM masks. The resulting feature space is semantically coherent and well-separated, and final object-level segmentation is obtained via unsupervised clustering. This enables fast editing, such as object removal, composition, and style transfer, by directly manipulating the scene's Gaussians. We evaluate TRASE on five dynamic benchmarks, demonstrating state-of-the-art segmentation performance from unseen viewpoints and its effectiveness across various interactive editing tasks. Our project page is available at: https://yunjinli.github.io/project-sadg/
翻译:理解动态三维场景对于扩展现实(XR)与自动驾驶至关重要。将语义信息融入三维重建能够实现整体场景表征,从而解锁沉浸式与交互式应用。为此,我们提出了TRASE——一种用于动态场景理解的新型免跟踪四维分割方法。TRASE以弱监督方式学习四维分割特征场,该方法利用由SAM掩码引导的软挖掘对比学习目标。所得特征空间在语义上具有一致性且分离性良好,最终通过无监督聚类获得对象级分割结果。这支持通过直接操控场景的高斯分布来实现快速编辑,例如对象移除、组合与风格迁移。我们在五个动态基准数据集上评估了TRASE,结果表明该方法在未见视角下实现了最先进的分割性能,并在多种交互式编辑任务中展现出卓越的有效性。项目页面详见:https://yunjinli.github.io/project-sadg/