We introduce a new RGB-D object dataset captured in the wild called WildRGB-D. Unlike most existing real-world object-centric datasets which only come with RGB capturing, the direct capture of the depth channel allows better 3D annotations and broader downstream applications. WildRGB-D comprises large-scale category-level RGB-D object videos, which are taken using an iPhone to go around the objects in 360 degrees. It contains around 8500 recorded objects and nearly 20000 RGB-D videos across 46 common object categories. These videos are taken with diverse cluttered backgrounds with three setups to cover as many real-world scenarios as possible: (i) a single object in one video; (ii) multiple objects in one video; and (iii) an object with a static hand in one video. The dataset is annotated with object masks, real-world scale camera poses, and reconstructed aggregated point clouds from RGBD videos. We benchmark four tasks with WildRGB-D including novel view synthesis, camera pose estimation, object 6d pose estimation, and object surface reconstruction. Our experiments show that the large-scale capture of RGB-D objects provides a large potential to advance 3D object learning. Our project page is https://wildrgbd.github.io/.
翻译:我们提出了一个名为WildRGB-D的新型真实世界RGB-D物体数据集。与大多数仅包含RGB捕捉的现有真实世界物体中心数据集不同,深度通道的直接捕捉能够提供更好的三维标注和更广泛的下游应用。WildRGB-D包含大规模类别级RGB-D物体视频,这些视频使用iPhone围绕物体进行360度拍摄。该数据集涵盖约8500个记录物体和近20000个RGB-D视频,涉及46个常见物体类别。这些视频在多样化的杂乱背景下拍摄,采用三种设置以尽可能覆盖更多真实世界场景:(i) 单个物体单视频;(ii) 多个物体单视频;(iii) 物体与静态手部单视频。数据集标注了物体掩码、真实世界尺度相机位姿以及从RGB-D视频重建的聚合点云。我们基于WildRGB-D对四个任务进行了基准测试,包括新视角合成、相机位姿估计、物体6D位姿估计和物体表面重建。实验表明,大规模RGB-D物体捕捉为推进三维物体学习提供了巨大潜力。项目页面:https://wildrgbd.github.io/。