Learned visual dynamics models have proven effective for robotic manipulation tasks. Yet, it remains unclear how best to represent scenes involving multi-object interactions. Current methods decompose a scene into discrete objects, but they struggle with precise modeling and manipulation amid challenging lighting conditions as they only encode appearance tied with specific illuminations. In this work, we propose using object-centric neural scattering functions (OSFs) as object representations in a model-predictive control framework. OSFs model per-object light transport, enabling compositional scene re-rendering under object rearrangement and varying lighting conditions. By combining this approach with inverse parameter estimation and graph-based neural dynamics models, we demonstrate improved model-predictive control performance and generalization in compositional multi-object environments, even in previously unseen scenarios and harsh lighting conditions.
翻译:学习得到的视觉动力学模型在机器人操控任务中已被证明有效。然而,如何最优地表示涉及多物体交互的场景仍不明确。当前方法将场景分解为离散物体,但由于它们仅编码与特定光照条件相关的外观,在复杂光照条件下难以实现精确建模与操控。本文提出在模型预测控制框架中使用基于对象中心的神经散射函数(OSFs)作为物体表示。OSFs对每个物体的光传输进行建模,实现了在物体重新排列和不同光照条件下组合场景的重新渲染。通过将该方法与逆参数估计和基于图的神经动力学模型相结合,我们展示了在组合式多物体环境中改进的模型预测控制性能与泛化能力,即使面对先前未见场景和恶劣光照条件也能取得良好效果。