Relying on in-domain annotations and precise sensor-rig priors, existing 3D occupancy prediction methods are limited in both scalability and out-of-domain generalization. While recent visual geometry foundation models exhibit strong generalization capabilities, they were mainly designed for general purposes and lack one or more key ingredients required for urban occupancy prediction, namely metric prediction, geometry completion in cluttered scenes and adaptation to urban scenarios. We address this gap and present OccAny, the first unconstrained urban 3D occupancy model capable of operating on out-of-domain uncalibrated scenes to predict and complete metric occupancy coupled with segmentation features. OccAny is versatile and can predict occupancy from sequential, monocular, or surround-view images. Our contributions are three-fold: (i) we propose the first generalized 3D occupancy framework with (ii) Segmentation Forcing that improves occupancy quality while enabling mask-level prediction, and (iii) a Novel View Rendering pipeline that infers novel-view geometry to enable test-time view augmentation for geometry completion. Extensive experiments demonstrate that OccAny outperforms all visual geometry baselines on 3D occupancy prediction task, while remaining competitive with in-domain self-supervised methods across three input settings on two established urban occupancy prediction datasets. Our code is available at https://github.com/valeoai/OccAny .
翻译:依赖领域内标定数据和精确的传感器参数先验,现有三维占用量测预测方法在可扩展性和跨域泛化方面均存在局限。尽管近期视觉几何基础模型展现出强泛化能力,但这类模型主要面向通用场景设计,缺乏城市占用量测预测所需的关键要素:度量预测、杂波场景几何补全及城市场景适应能力。为此,我们提出OccAny——首个能处理域外无标定场景的无约束城市三维占用量测模型,可对度量占用量测进行预测和补全,并联合输出分割特征。OccAny具有高度通用性,支持从序列图像、单目图像或环视图像中预测占用量测。本文贡献包含三方面:(i)首次提出通用三维占用量测框架;(ii)提出分割强制策略(Segmentation Forcing),在提升占用量测质量的同时实现掩码级预测;(iii)构建新视角渲染管线,通过推断新视角几何实现测试阶段视角增强以完成几何补全。大量实验表明,OccAny在三维占用量测预测任务上超越所有视觉几何基线方法,同时在两个主流城市占用量测预测数据集上的三种输入配置下,与领域内自监督方法保持竞争力。代码已开源:https://github.com/valeoai/OccAny