Despite the recent progress on 6D object pose estimation methods for robotic grasping, a substantial performance gap persists between the capabilities of these methods on existing datasets and their efficacy in real-world mobile manipulation tasks, particularly when robots rely solely on their monocular egocentric field of view (FOV). Existing real-world datasets primarily focus on table-top grasping scenarios, where a robotic arm is placed in a fixed position and the objects are centralized within the FOV of fixed external camera(s). Assessing performance on such datasets may not accurately reflect the challenges encountered in everyday mobile manipulation tasks within kitchen environments such as retrieving objects from higher shelves, sinks, dishwashers, ovens, refrigerators, or microwaves. To address this gap, we present Kitchen, a novel benchmark designed specifically for estimating the 6D poses of objects located in diverse positions within kitchen settings. For this purpose, we recorded a comprehensive dataset comprising around 205k real-world RGBD images for 111 kitchen objects captured in two distinct kitchens, utilizing one humanoid robot with its egocentric perspectives. Subsequently, we developed a semi-automated annotation pipeline, to streamline the labeling process of such datasets, resulting in the generation of 2D object labels, 2D object segmentation masks, and 6D object poses with minimized human effort. The benchmark, the dataset, and the annotation pipeline are available at https://kitchen-dataset.github.io/KITchen.
翻译:尽管基于6D物体姿态估计的机器人抓取方法近期取得了进展,但现有数据集上的方法性能与其在真实世界移动操作任务中的有效性之间仍存在显著差距,尤其是当机器人仅依赖单目自我中心视野(FOV)时。现有真实世界数据集主要聚焦于桌面抓取场景,其中机械臂固定在特定位置,物体集中于固定外部摄像头的FOV内。在此类数据集上评估性能可能无法准确反映厨房环境中日常移动操作任务(如从高架、水槽、洗碗机、烤箱、冰箱或微波炉中取物)所面临的挑战。为弥补这一空白,我们提出了KITchen——一个专门为估计厨房环境中不同位置物体的6D姿态而设计的新型基准。为此,我们记录了一个综合数据集,包含在两种不同厨房中利用人形机器人自我中心视角采集的约20.5万张真实世界RGBD图像,涵盖111个厨房物体。随后,我们开发了半自动标注流程以简化此类数据集的标记过程,通过最小化人工干预生成了2D物体标签、2D物体分割掩码和6D物体姿态。该基准、数据集及标注流程均可在https://kitchen-dataset.github.io/KITchen获取。