We present the HANDAL dataset for category-level object pose estimation and affordance prediction. Unlike previous datasets, ours is focused on robotics-ready manipulable objects that are of the proper size and shape for functional grasping by robot manipulators, such as pliers, utensils, and screwdrivers. Our annotation process is streamlined, requiring only a single off-the-shelf camera and semi-automated processing, allowing us to produce high-quality 3D annotations without crowd-sourcing. The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories. We focus on hardware and kitchen tool objects to facilitate research in practical scenarios in which a robot manipulator needs to interact with the environment beyond simple pushing or indiscriminate grasping. We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks. We also provide 3D reconstructed meshes of all objects, and we outline some of the bottlenecks to be addressed for democratizing the collection of datasets like this one.
翻译:我们提出HANDAL数据集,用于类别级物体姿态估计与功能属性预测。与现有数据集不同,本数据集专注于适合机器人操作的可操控物体,这些物体的尺寸和形状适宜机器人机械手进行功能性抓取,例如钳子、餐具和螺丝刀。我们的标注流程经过优化,仅需单一商用相机和半自动化处理即可生成高质量三维标注结果,无需依赖众包。该数据集包含来自212个真实世界物体(覆盖17个类别)的2200段视频中的30.8万帧标注图像。我们重点收录五金工具和厨房工具类物体,以促进机器人在实际场景中超越简单推动或盲目抓取,实现与环境交互的研究。我们阐述了该数据集在六自由度类别级姿态+尺度估计及相关任务中的实用价值。同时提供所有物体的三维重建网格,并指出在推动此类数据集采集普及化过程中需解决的关键瓶颈问题。