Stowing, the task of placing objects in cluttered shelves or bins, is a common task in warehouse and manufacturing operations. However, this task is still predominantly carried out by human workers as stowing is challenging to automate due to the complex multi-object interactions and long-horizon nature of the task. Previous works typically involve extensive data collection and costly human labeling of semantic priors across diverse object categories. This paper presents a method to learn a generalizable robot stowing policy from predictive model of object interactions and a single demonstration with behavior primitives. We propose a novel framework that utilizes Graph Neural Networks to predict object interactions within the parameter space of behavioral primitives. We further employ primitive-augmented trajectory optimization to search the parameters of a predefined library of heterogeneous behavioral primitives to instantiate the control action. Our framework enables robots to proficiently execute long-horizon stowing tasks with a few keyframes (3-4) from a single demonstration. Despite being solely trained in a simulation, our framework demonstrates remarkable generalization capabilities. It efficiently adapts to a broad spectrum of real-world conditions, including various shelf widths, fluctuating quantities of objects, and objects with diverse attributes such as sizes and shapes.
翻译:仓储——即把物体放置于杂乱货架或容器中的任务——是仓库与制造作业中的常见操作。然而,由于复杂多物体交互及任务的长时域特性,该任务仍需主要由人工完成。现有方法通常需要大量数据采集与高昂的人工标注成本以获取跨类别语义先验。本文提出一种方法,通过物体交互预测模型与单次演示中的行为基元,学习可泛化的机器人仓储策略。我们构建了一个新颖框架,利用图神经网络在行为基元的参数空间中预测物体交互。进一步采用基元增强轨迹优化方法,搜索预定义异构行为基元库的参数以实例化控制动作。该框架使机器人能够通过单次演示中的少量关键帧(3-4帧)熟练执行长时域仓储任务。尽管仅在仿真环境中训练,我们的框架展现出卓越的泛化能力,可高效适配多种真实场景,包括不同货架宽度、变化的物体数量以及尺寸形状各异的物体。