3D generation on ImageNet

Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location, and orientation, and the camera always points to the center of the scene. This makes them inapplicable to diverse, in-the-wild datasets of non-alignable scenes rendered from arbitrary camera poses. In this work, we develop a 3D generator with Generic Priors (3DGP): a 3D synthesis framework with more general assumptions about the training data, and show that it scales to very challenging datasets, like ImageNet. Our model is based on three new ideas. First, we incorporate an inaccurate off-the-shelf depth estimator into 3D GAN training via a special depth adaptation module to handle the imprecision. Then, we create a flexible camera model and a regularization strategy for it to learn its distribution parameters during training. Finally, we extend the recent ideas of transferring knowledge from pre-trained classifiers into GANs for patch-wise trained models by employing a simple distillation-based technique on top of the discriminator. It achieves more stable training than the existing methods and speeds up the convergence by at least 40%. We explore our model on four datasets: SDIP Dogs 256x256, SDIP Elephants 256x256, LSUN Horses 256x256, and ImageNet 256x256, and demonstrate that 3DGP outperforms the recent state-of-the-art in terms of both texture and geometry quality. Code and visualizations: https://snap-research.github.io/3dgp.

翻译：现有的2D到3D生成器通常针对精心整理的单类别数据集设计，其中所有物体具有（近似）相同的尺度、三维位置和朝向，且相机始终指向场景中心。这使它们无法适用于从任意相机姿态渲染的非对齐场景构成的多样化、真实场景数据集。本文提出一种具有通用先验的3D生成器（3DGP）：一种对训练数据假设更通用的三维合成框架，并证明其可扩展至ImageNet等高难度数据集。我们的模型基于三项创新：首先，通过专用深度自适应模块处理精度问题，将现成的不准确深度估计器融入3D GAN训练；其次，构建灵活相机模型并设计正则化策略，在训练过程中学习其分布参数；最后，采用判别器上的简化蒸馏技术，将预训练分类器知识迁移至补丁式训练模型的GAN框架。该方法比现有方法训练更稳定，收敛速度提升至少40%。我们在SDIP Dogs 256x256、SDIP Elephants 256x256、LSUN Horses 256x256和ImageNet 256x256四个数据集上验证模型，证明3DGP在纹理和几何质量方面均优于最新技术。代码与可视化结果：https://snap-research.github.io/3dgp。

相关内容

ImageNet (数据集)

关注 22

ImageNet项目是一个用于视觉对象识别软件研究的大型可视化数据库。超过1400万的图像URL被ImageNet手动注释，以指示图片中的对象;在至少一百万个图像中，还提供了边界框。ImageNet包含2万多个类别; [2]一个典型的类别，如“气球”或“草莓”，包含数百个图像。第三方图像URL的注释数据库可以直接从ImageNet免费获得;但是，实际的图像不属于ImageNet。自2010年以来，ImageNet项目每年举办一次软件比赛，即ImageNet大规模视觉识别挑战赛（ILSVRC），软件程序竞相正确分类检测物体和场景。 ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表。2012年在解决ImageNet挑战方面取得了巨大的突破，被广泛认为是2010年的深度学习革命的开始。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日