Relying on in-domain annotations and precise sensor-rig priors, existing 3D occupancy prediction methods are limited in both scalability and out-of-domain generalization. While recent visual geometry foundation models exhibit strong generalization capabilities, they were mainly designed for general purposes and lack one or more key ingredients required for urban occupancy prediction, namely metric prediction, geometry completion in cluttered scenes and adaptation to urban scenarios. We address this gap and present OccAny, the first unconstrained urban 3D occupancy model capable of operating on out-of-domain uncalibrated scenes to predict and complete metric occupancy coupled with segmentation features. OccAny is versatile and can predict occupancy from sequential, monocular, or surround-view images. Our contributions are three-fold: (i) we propose the first generalized 3D occupancy framework with (ii) Segmentation Forcing that improves occupancy quality while enabling mask-level prediction, and (iii) a Novel View Rendering pipeline that infers novel-view geometry to enable test-time view augmentation for geometry completion. Extensive experiments demonstrate that OccAny outperforms all visual geometry baselines on 3D occupancy prediction task, while remaining competitive with in-domain self-supervised methods across three input settings on two established urban occupancy prediction datasets. Our code is available at https://github.com/valeoai/OccAny .
翻译:依赖域内标注和精确传感器先验的现有三维占据预测方法在可扩展性和域外泛化方面均存在局限。尽管近期视觉几何基础模型展现出强大的泛化能力,但其主要面向通用场景设计,缺乏城市占据预测所需的关键要素——度量预测、杂乱场景下的几何补全以及城市场景适配。为弥补这一空白,我们提出OccAny,首个能够处理域外未标定场景的无约束城市三维占据模型,可预测并补全带有分割特征的度量占据。OccAny具有通用性,支持从序列图、单目图或环视图进行占据预测。本文贡献包括三方面:(i) 提出首个泛化三维占据框架;(ii) 提出分割强制机制,在提升占据质量的同时实现掩码级预测;(iii) 提出新视角渲染管线,通过推断新视角几何实现测试时视角增强的几何补全。大量实验表明,OccAny在三维占据预测任务上超越所有视觉几何基线方法,同时在两个公开城市占据预测数据集上,与基于域内自监督的方法在三种输入设置下均保持竞争力。代码已开源:https://github.com/valeoai/OccAny