Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.
翻译:在动态工业机器人场景(如配送仓库)中,离散帧内未知物体实例的分割与追踪构成重大挑战。机器人需应对物体重排操作(包括移位、移除及新物品造成的部分遮挡),并在较大时间间隔后追踪这些物体。当机器人遭遇训练集中未学习的物体时,任务复杂度进一步提升——这要求具备分割与追踪先前未见实例的能力。考虑到此类场景中通常无法实现连续观测,我们的任务涉及处理由不定时间间隔分隔的离散帧集合,期间场景可能发生显著变化。该任务同样适用于家用机器人场景,例如桌面物体的重排。为应对这些严峻挑战,我们引入了模拟工业与家庭场景的新型合成数据集和真实数据集,并提出了一种联合离散帧分割与追踪的新范式,同时设计了一个促进帧间高效通信的Transformer模块。实验表明,我们的方法显著优于近期研究成果。更多结果与视频请访问\href{https://sites.google.com/view/stow-corl23}{网站}。代码与数据集将公开。