This paper introduces a dataset for training and evaluating methods for 6D pose estimation of hand-held tools in task demonstrations captured by a standard RGB camera. Despite the significant progress of 6D pose estimation methods, their performance is usually limited for heavily occluded objects, which is a common case in imitation learning, where the object is typically partially occluded by the manipulating hand. Currently, there is a lack of datasets that would enable the development of robust 6D pose estimation methods for these conditions. To overcome this problem, we collect a new dataset (Imitrob) aimed at 6D pose estimation in imitation learning and other applications where a human holds a tool and performs a task. The dataset contains image sequences of nine different tools and twelve manipulation tasks with two camera viewpoints, four human subjects, and left/right hand. Each image is accompanied by an accurate ground truth measurement of the 6D object pose obtained by the HTC Vive motion tracking device. The use of the dataset is demonstrated by training and evaluating a recent 6D object pose estimation method (DOPE) in various setups.
翻译:本文介绍了一个用于训练和评估在标准RGB摄像头记录的任务演示中手持工具6D姿态估计方法的数据集。尽管6D姿态估计方法取得了显著进展,但其在严重遮挡物体上的性能通常受限,这在模仿学习中是一种常见情况——物体通常被操作手部分遮挡。目前,缺乏能够针对此类条件开发鲁棒6D姿态估计方法的数据集。为解决这一问题,我们收集了一个名为Imitrob的新数据集,旨在支持模仿学习及其他人类手持工具执行任务场景下的6D姿态估计。该数据集包含九种不同工具和十二种操作任务的图像序列,涵盖两个摄像头视角、四名受试者以及左右手操作。每张图像均附有通过HTC Vive运动追踪设备获取的精确6D物体姿态真实值。通过在不同设置下训练和评估最新6D物体姿态估计方法(DOPE),展示了该数据集的应用价值。