We present Free-Range Gaussians, a multi-view reconstruction method that predicts non-pixel, non-voxel-aligned 3D Gaussians from as few as four images. This is done through flow matching over Gaussian parameters. Our generative formulation of reconstruction allows the model to be supervised with non-grid-aligned 3D data, and enables it to synthesize plausible content in unobserved regions. Thus, it improves on prior methods that produce highly redundant grid-aligned Gaussians, and suffer from holes or blurry conditional means in unobserved regions. To handle the number of Gaussians needed for high-quality results, we introduce a hierarchical patching scheme to group spatially related Gaussians into joint transformer tokens, halving the sequence length while preserving structure. We further propose a timestep-weighted rendering loss during training, and photometric gradient guidance and classifier-free guidance at inference to improve fidelity. Experiments on Objaverse and Google Scanned Objects show consistent improvements over pixel and voxel-aligned methods while using significantly fewer Gaussians, with large gains when input views leave parts of the object unobserved.
翻译:我们提出Free-Range Gaussians,一种从最少四张图像预测非像素、非体素对齐的三维高斯体的多视图重建方法。该方法通过高斯参数上的流匹配实现。我们采用重建的生成式公式,允许模型使用非网格对齐的三维数据进行监督训练,使其能够在未观测区域合成合理内容。相较之下,现有方法生成高度冗余的网格对齐高斯体,并在未观测区域产生空洞或模糊的条件均值。为处理高质量结果所需的高斯体数量,我们引入分层分块方案,将空间相关的高斯体分组为联合Transformer令牌,在保留结构的同时将序列长度减半。我们还提出训练阶段的时步加权渲染损失,以及推理阶段的光度梯度引导和无分类器引导,以提升保真度。在Objaverse和Google Scanned Objects数据集上的实验表明,与像素和体素对齐方法相比,本方法在显著减少高斯体数量的同时取得一致性改进,尤其当输入视角使物体部分区域未被观测时,性能提升尤为显著。