Neural implicit functions have demonstrated significant importance in various areas such as computer vision, graphics. Their advantages include the ability to represent complex shapes and scenes with high fidelity, smooth interpolation capabilities, and continuous representations. Despite these benefits, the development and analysis of implicit functions have been limited by the lack of comprehensive datasets and the substantial computational resources required for their implementation and evaluation. To address these challenges, we introduce "Implicit-Zoo": a large-scale dataset requiring thousands of GPU training days designed to facilitate research and development in this field. Our dataset includes diverse 2D and 3D scenes, such as CIFAR-10, ImageNet-1K, and Cityscapes for 2D image tasks, and the OmniObject3D dataset for 3D vision tasks. We ensure high quality through strict checks, refining or filtering out low-quality data. Using Implicit-Zoo, we showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models. This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
翻译:神经隐式函数在计算机视觉、图形学等多个领域展现出显著的重要性。其优势包括能够以高保真度表示复杂形状与场景、具备平滑插值能力以及提供连续表示。尽管存在这些优点,隐式函数的发展与分析一直受限于缺乏综合性数据集以及实现与评估所需的大量计算资源。为应对这些挑战,我们提出了"Implicit-Zoo":一个需要数千GPU训练日的大规模数据集,旨在促进该领域的研究与开发。我们的数据集涵盖多样化的二维与三维场景,例如针对二维图像任务的CIFAR-10、ImageNet-1K和Cityscapes,以及针对三维视觉任务的OmniObject3D数据集。我们通过严格的质量检查确保数据质量,对低质量数据进行精炼或过滤。基于Implicit-Zoo,我们展示了两个直接的应用优势:该数据集能够(1)为Transformer模型学习令牌位置;(2)直接回归二维图像相对于NeRF模型的三维相机位姿。这进而提升了图像分类、语义分割和三维位姿回归三项任务的性能,从而为相关研究开辟了新途径。