We present ScanNet++, a large-scale dataset that couples together capture of high-quality and commodity-level geometry and color of indoor scenes. Each scene is captured with a high-end laser scanner at sub-millimeter resolution, along with registered 33-megapixel images from a DSLR camera, and RGB-D streams from an iPhone. Scene reconstructions are further annotated with an open vocabulary of semantics, with label-ambiguous scenarios explicitly annotated for comprehensive semantic understanding. ScanNet++ enables a new real-world benchmark for novel view synthesis, both from high-quality RGB capture, and importantly also from commodity-level images, in addition to a new benchmark for 3D semantic scene understanding that comprehensively encapsulates diverse and ambiguous semantic labeling scenarios. Currently, ScanNet++ contains 460 scenes, 280,000 captured DSLR images, and over 3.7M iPhone RGBD frames.
翻译:我们提出ScanNet++,这是一个大规模数据集,它将室内场景的高质量与消费级几何与色彩采集相结合。每个场景均采用高端激光扫描仪以亚毫米级分辨率采集,并配以来自数码单反相机的3300万像素已配准图像以及iPhone的RGB-D流。场景重建进一步通过开放语义词汇进行标注,其中标签模糊场景得到明确标注以支持全面的语义理解。ScanNet++为新型视角合成(不仅基于高质量RGB采集,还基于消费级图像)以及三维语义场景理解(全面涵盖多样且模糊的语义标注场景)提供了新的真实世界基准。目前,ScanNet++包含460个场景、28万张采集的DSLR图像以及超过370万个iPhone RGB-D帧。