Pose estimation is a crucial task in computer vision and robotics, enabling the tracking and manipulation of objects in images or videos. While several datasets exist for pose estimation, there is a lack of large-scale datasets specifically focusing on cluttered scenes with occlusions. We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios. PACE consists of 54,945 frames with 257,673 annotations across 300 videos, covering 576 objects from 44 categories and featuring a mix of rigid and articulated items in cluttered scenes. To annotate the real-world data efficiently, we developed an innovative annotation system utilizing a calibrated 3-camera setup. We test state-of-the-art algorithms in PACE along two tracks: pose estimation, and object pose tracking, revealing the benchmark's challenges and research opportunities. Our code and data is available on https://github.com/qq456cvb/PACE.
翻译:姿态估计是计算机视觉与机器人领域的关键任务,能够实现图像或视频中物体的跟踪与操控。尽管现有多个姿态估计数据集,但专门聚焦于包含遮挡的杂乱场景的大规模数据集仍然匮乏。本文提出PACE(面向杂乱环境的姿态标注),一个旨在推动杂乱场景下姿态估计方法开发与评估的大规模基准数据集。该数据集包含54,945帧图像、257,673个标注,覆盖300段视频中的44个类别共576个物体,融合了杂乱场景中的刚体与关节物体。为实现真实世界数据的高效标注,我们创新性地开发了一套基于标定三相机系统的标注方案。我们在PACE上沿着两个方向测试了当前最优算法:姿态估计与物体姿态跟踪,揭示了该基准的挑战与研究机遇。我们的代码与数据发布于https://github.com/qq456cvb/PACE。