In this paper, we explore a novel point representation for 3D occupancy prediction from multi-view images, which is named Occupancy as Set of Points. Existing camera-based methods tend to exploit dense volume-based representation to predict the occupancy of the whole scene, making it hard to focus on the special areas or areas out of the perception range. In comparison, we present the Points of Interest (PoIs) to represent the scene and propose OSP, a novel framework for point-based 3D occupancy prediction. Owing to the inherent flexibility of the point-based representation, OSP achieves strong performance compared with existing methods and excels in terms of training and inference adaptability. It extends beyond traditional perception boundaries and can be seamlessly integrated with volume-based methods to significantly enhance their effectiveness. Experiments on the Occ3D nuScenes occupancy benchmark show that OSP has strong performance and flexibility. Code and models are available at \url{https://github.com/hustvl/osp}.
翻译:本文探索了一种新颖的点表示方法,用于从多视角图像进行三维占据预测,该方法被命名为“基于点集的占据预测”。现有的基于相机的方法倾向于利用密集的体素化表示来预测整个场景的占据情况,这使得模型难以聚焦于特定区域或感知范围之外的区域。相比之下,我们提出使用兴趣点来表示场景,并提出了OSP——一种基于点的三维占据预测新框架。得益于点表示固有的灵活性,OSP与现有方法相比实现了强劲的性能,并在训练和推理适应性方面表现出色。它超越了传统的感知边界,并且可以无缝地与基于体素的方法集成,从而显著提升其效能。在Occ3D nuScenes占据基准测试上的实验表明,OSP具有强大的性能和灵活性。代码和模型可在 \url{https://github.com/hustvl/osp} 获取。