We present High-Density Visual Particle Dynamics (HD-VPD), a learned world model that can emulate the physical dynamics of real scenes by processing massive latent point clouds containing 100K+ particles. To enable efficiency at this scale, we introduce a novel family of Point Cloud Transformers (PCTs) called Interlacers leveraging intertwined linear-attention Performer layers and graph-based neighbour attention layers. We demonstrate the capabilities of HD-VPD by modeling the dynamics of high degree-of-freedom bi-manual robots with two RGB-D cameras. Compared to the previous graph neural network approach, our Interlacer dynamics is twice as fast with the same prediction quality, and can achieve higher quality using 4x as many particles. We illustrate how HD-VPD can evaluate motion plan quality with robotic box pushing and can grasping tasks. See videos and particle dynamics rendered by HD-VPD at https://sites.google.com/view/hd-vpd.
翻译:本文提出高密度视觉粒子动力学(HD-VPD),这是一种通过学习构建的世界模型,能够通过处理包含10万以上粒子的大规模潜在点云来模拟真实场景的物理动力学。为实现该规模下的高效计算,我们引入了一类新型点云Transformer(PCT)——交错器(Interlacers),其融合了交织的线性注意力Performer层与基于图的邻域注意力层。我们通过使用两台RGB-D相机对高自由度双手机器人的动力学进行建模,展示了HD-VPD的能力。与先前的图神经网络方法相比,在保持相同预测质量的前提下,我们的交错器动力学模型速度提升两倍,且在使用四倍粒子数量时可达到更高精度。我们进一步演示了HD-VPD如何通过机器人推箱和抓取任务评估运动规划质量。相关视频及HD-VPD渲染的粒子动力学结果请访问:https://sites.google.com/view/hd-vpd。