Language-Conditioned Robotic Grasping (LCRG) aims to develop robots that ground and grasp objects based on natural language instructions. While robots capable of recognizing personal objects like "my wallet" can interact more naturally with non-expert users, current LCRG systems primarily limit robots to understanding only generic expressions. To this end, we introduce a task scenario GraspMine with a novel dataset that aims to locate and grasp personal objects given personal indicators via learning from a single human-robot interaction. To address GraspMine, we propose Personalized Grasping Agent (PGA), that learns personal objects by propagating user-given information through a Reminiscence-a collection of raw images from the user's environment. Specifically, PGA acquires personal object information by a user presenting a personal object with its associated indicator, followed by PGA inspecting the object by rotating it. Based on the acquired information, PGA pseudo-labels objects in the Reminiscence by our proposed label propagation algorithm. Harnessing the information acquired from the interactions and the pseudo-labeled objects in the Reminiscence, PGA adapts the object grounding model to grasp personal objects. Experiments on GraspMine show that PGA significantly outperforms baseline methods both in offline and online settings, signifying its effectiveness and personalization applicability on real-world scenarios. Finally, qualitative analysis shows the effectiveness of PGA through a detailed investigation of results in each phase.
翻译:语言条件机器人抓取旨在使机器人能够基于自然语言指令定位并抓取物体。具备识别"我的钱包"等个人物品能力的机器人可与非专业用户更自然地交互,但现有系统主要将机器人限制在理解通用表达层面。为此,我们提出名为GraspMine的任务场景及配套新数据集,旨在通过单次人机交互学习,使机器人能根据个人指示词定位并抓取个人物品。为应对该任务,我们提出个性化抓取智能体,通过将用户给定信息经由"回忆集"(用户环境原始图像集合)传播来学习个人物品。具体而言,PGA通过用户展示个人物品及其关联指示词获取信息,随后PGA旋转检查该物品。基于获取的信息,PGA通过提出的标签传播算法对回忆集中的物体进行伪标注。利用交互获取的信息与回忆集中伪标注物体,PGA调整物体定位模型以实现个人物品抓取。在GraspMine上的实验表明,PGA在离线和在线场景下均显著优于基线方法,验证了其在真实场景中的有效性与个性化适用性。最后,通过各阶段结果的详细分析,定性研究展示了PGA的有效性。