Stowing, the task of placing objects in cluttered shelves or bins, is a common task in warehouse and manufacturing operations. However, this task is still predominantly carried out by human workers as stowing is challenging to automate due to the complex multi-object interactions and long-horizon nature of the task. Previous works typically involve extensive data collection and costly human labeling of semantic priors across diverse object categories. This paper presents a method to learn a generalizable robot stowing policy from predictive model of object interactions and a single demonstration with behavior primitives. We propose a novel framework that utilizes Graph Neural Networks to predict object interactions within the parameter space of behavioral primitives. We further employ primitive-augmented trajectory optimization to search the parameters of a predefined library of heterogeneous behavioral primitives to instantiate the control action. Our framework enables robots to proficiently execute long-horizon stowing tasks with a few keyframes (3-4) from a single demonstration. Despite being solely trained in a simulation, our framework demonstrates remarkable generalization capabilities. It efficiently adapts to a broad spectrum of real-world conditions, including various shelf widths, fluctuating quantities of objects, and objects with diverse attributes such as sizes and shapes.
翻译:收纳——将物体放置于杂乱货架或容器中的任务——是仓储与制造业中的常见作业。然而,由于复杂的多物体交互与任务的长时域特性,该任务至今仍主要由人工完成。以往研究通常需要大量数据采集以及跨多样物体类别的昂贵语义先验人工标注。本文提出一种方法,通过物体交互预测模型与单次行为基元演示,学习通用化的机器人收纳策略。我们设计了一个新颖框架,在行为基元的参数空间内利用图神经网络预测物体交互,并进一步采用基元增强轨迹优化方法搜索预定义的异构行为基元库参数,以实例化控制动作。该框架使机器人仅凭单次演示中的少量关键帧(3-4帧)即可熟练执行长时域收纳任务。尽管仅在仿真环境中训练,该框架展现出卓越的泛化能力,能高效适应多种真实场景条件,包括不同货架宽度、动态物体数量以及尺寸与形状各异的物体属性。