Realizing generalizable dynamic object manipulation on conveyor systems is important for enhancing manufacturing efficiency, as it eliminates specialized engineering for different scenarios. To this end, imitation learning emerges as a promising paradigm, leveraging expert demonstrations to teach a policy manipulation skills. Although the generalization of an imitation learning policy can be improved by increasing demonstrations, demonstration collection is labor-intensive. Besides, public dynamic object manipulation data is scarce. In this work, we address this data scarcity problem via generating demonstrations in a simulator. A significant challenge of using simulated data lies in the appearance gap between simulated and real-world observations. To tackle this challenge, we propose Geometry-Enhanced Model (GEM), which employs our designed appearance noise annealing strategy to shape the policy optimization path, thereby prioritizing the geometry information in observations. Extensive experiments in simulated and real-world tasks demonstrate that GEM can generalize across environment backgrounds, robot embodiments, motion dynamics, and object geometries. Notably, GEM is deployed in a real canteen for tableware collection. Without test-scene data, GEM achieves a success rate of over 97% across more than 10,000 operations.
翻译:在传送带系统上实现可泛化的动态物体操控对于提升制造效率至关重要,因为它免除了针对不同场景的专门工程改造。为此,模仿学习成为一种有前景的范式,它利用专家演示来教授策略操控技能。虽然通过增加演示可以提升模仿学习策略的泛化能力,但演示数据的收集是劳动密集型的。此外,公开的动态物体操控数据十分稀缺。在本工作中,我们通过在仿真器中生成演示数据来解决这一数据稀缺问题。使用仿真数据的一个重大挑战在于仿真观测与现实世界观测之间存在外观差异。为应对这一挑战,我们提出了几何增强模型(GEM),该模型采用我们设计的外观噪声退火策略来塑造策略优化路径,从而优先利用观测中的几何信息。在仿真和现实世界任务中进行的大量实验表明,GEM能够跨环境背景、机器人形态、运动动力学和物体几何形状实现泛化。值得注意的是,GEM已部署于一个真实食堂用于餐具回收。在未使用测试场景数据的情况下,GEM在超过10,000次操作中实现了超过97%的成功率。