Dynamic objects in our physical 4D (3D + time) world are constantly evolving, deforming, and interacting with other objects, leading to diverse 4D scene dynamics. In this paper, we present a universal generative pipeline, CHORD, for CHOReographing Dynamic objects and scenes and synthesizing this type of phenomena. Traditional rule-based graphics pipelines to create these dynamics are based on category-specific heuristics, yet are labor-intensive and not scalable. Recent learning-based methods typically demand large-scale datasets, which may not cover all object categories in interest. Our approach instead inherits the universality from the video generative models by proposing a distillation-based pipeline to extract the rich Lagrangian motion information hidden in the Eulerian representations of 2D videos. Our method is universal, versatile, and category-agnostic. We demonstrate its effectiveness by conducting experiments to generate a diverse range of multi-body 4D dynamics, show its advantage compared to existing methods, and demonstrate its applicability in generating robotics manipulation policies. Project page: https://yanzhelyu.github.io/chord
翻译:在我们物理的4D(3D+时间)世界中,动态物体持续演化、变形并与其他物体相互作用,从而产生多样化的4D场景动态。本文提出一种通用生成流程CHORD,用于编排动态物体与场景并合成此类现象。创建这些动态的传统基于规则的图形学流程依赖于特定类别的启发式方法,不仅劳动密集且难以扩展。近期基于学习的方法通常需要大规模数据集,而这些数据集可能无法覆盖所有感兴趣的对象类别。我们的方法则通过提出一种基于蒸馏的流程,从2D视频的欧拉表示中提取隐藏的丰富拉格朗日运动信息,从而继承了视频生成模型的普适性。本方法具有通用性、多功能性且与类别无关。我们通过生成多样化多体4D动态的实验验证其有效性,展示其相较于现有方法的优势,并证明其在生成机器人操作策略方面的适用性。项目页面:https://yanzhelyu.github.io/chord