We introduce Free3D, a simple approach designed for open-set novel view synthesis (NVS) from a single image. Similar to Zero-1-to-3, we start from a pre-trained 2D image generator for generalization, and fine-tune it for NVS. Compared to recent and concurrent works, we obtain significant improvements without resorting to an explicit 3D representation, which is slow and memory-consuming or training an additional 3D network. We do so by encoding better the target camera pose via a new per-pixel ray conditioning normalization (RCN) layer. The latter injects pose information in the underlying 2D image generator by telling each pixel its specific viewing direction. We also improve multi-view consistency via a light-weight multi-view attention layer and multi-view noise sharing. We train Free3D on the Objaverse dataset and demonstrate excellent generalization to various new categories in several new datasets, including OminiObject3D and GSO. We hope our simple and effective approach will serve as a solid baseline and help future research in NVS with more accuracy pose. The project page is available at https://chuanxiaz.com/free3d/.
翻译:我们提出Free3D,一种为单张图像开放集新视角合成(NVS)设计的简易方法。与Zero-1-to-3类似,我们以预训练的2D图像生成器为基础进行泛化,并通过微调使其适用于NVS。与近期及同期工作相比,我们无需借助显式3D表示(该方法速度慢且消耗内存)或训练额外的3D网络即可获得显著改进。为此,我们通过一种新的逐像素射线条件归一化(RCN)层更好地编码目标相机姿态,该层通过告知每个像素其特定观测方向,将姿态信息注入底层2D图像生成器。此外,我们通过轻量级多视角注意力层和多视角噪声共享提升多视角一致性。我们在Objaverse数据集上训练Free3D,并在多个新数据集(包括OminiObject3D和GSO)上展现出对各类新类别的优异泛化能力。希望这种简单有效的方法能为NVS研究提供坚实基线,并助力未来更精确姿态相关研究。项目页面详见https://chuanxiaz.com/free3d/。