MegaScenes: Scene-Level View Synthesis at Scale

Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications. Recently, pose-conditioned diffusion models have led to significant progress by extracting 3D information from 2D foundation models, but these methods are limited by the lack of scene-level training data. Common dataset choices either consist of isolated objects (Objaverse), or of object-centric scenes with limited pose distributions (DTU, CO3D). In this paper, we create a large-scale scene-level dataset from Internet photo collections, called MegaScenes, which contains over 100K structure from motion (SfM) reconstructions from around the world. Internet photos represent a scalable data source but come with challenges such as lighting and transient objects. We address these issues to further create a subset suitable for the task of NVS. Additionally, we analyze failure cases of state-of-the-art NVS methods and significantly improve generation consistency. Through extensive experiments, we validate the effectiveness of both our dataset and method on generating in-the-wild scenes. For details on the dataset and code, see our project page at https://megascenes.github.io .

翻译：场景级新颖视图合成（NVS）是许多视觉与图形应用的基础。最近，姿态条件扩散模型通过从二维基础模型中提取三维信息取得了显著进展，但这些方法受限于缺乏场景级训练数据。常见的数据集选择要么包含孤立物体（Objaverse），要么包含姿态分布有限的以物体为中心的场景（DTU、CO3D）。在本文中，我们从互联网照片集合中创建了一个大规模场景级数据集，称为MegaScenes，其中包含来自全球的超过10万个运动恢复结构（SfM）重建结果。互联网照片代表了一种可扩展的数据来源，但也带来了光照和瞬态物体等挑战。我们解决了这些问题，进一步创建了一个适用于NVS任务的子集。此外，我们分析了最先进NVS方法的失败案例，并显著提升了生成一致性。通过大量实验，我们验证了我们的数据集和方法在生成真实世界场景方面的有效性。有关数据集和代码的详细信息，请参见我们的项目页面 https://megascenes.github.io。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日