Semantic Scene Completion (SSC) from monocular RGB images is a fundamental yet challenging task due to the inherent ambiguity of inferring occluded 3D geometry from a single view. While feed-forward methods have made progress, they often struggle to generate plausible details in occluded regions and preserve the fundamental spatial relationships of objects. Such accurate generative reasoning capability for the entire 3D space is critical in real-world applications. In this paper, we present FlowSSC, the first generative framework applied directly to monocular semantic scene completion. FlowSSC treats the SSC task as a conditional generation problem and can seamlessly integrate with existing feed-forward SSC methods to significantly boost their performance. To achieve real-time inference without compromising quality, we introduce Shortcut Flow-matching that operates in a compact triplane latent space. Unlike standard diffusion models that require hundreds of steps, our method utilizes a shortcut mechanism to achieve high-fidelity generation in a single step, enabling practical deployment in autonomous systems. Extensive experiments on SemanticKITTI demonstrate that FlowSSC achieves state-of-the-art performance, significantly outperforming existing baselines.
翻译:从单目RGB图像进行语义场景补全是一项基础且具有挑战性的任务,其难点在于从单一视图推断被遮挡的三维几何结构存在固有的模糊性。虽然前馈方法已取得进展,但它们通常难以在被遮挡区域生成合理的细节,并难以保持物体间基本的空间关系。这种对整个三维空间进行精确生成推理的能力在实际应用中至关重要。本文提出FlowSSC,这是首个直接应用于单目语义场景补全的生成式框架。FlowSSC将SSC任务视为条件生成问题,能够与现有的前馈式SSC方法无缝集成,从而显著提升其性能。为实现实时推理且不牺牲质量,我们引入了在紧凑三平面潜在空间中操作的捷径流匹配方法。与需要数百步采样的标准扩散模型不同,我们的方法利用捷径机制,仅需单步即可实现高保真生成,从而使其能够在自动驾驶等系统中实际部署。在SemanticKITTI数据集上的大量实验表明,FlowSSC取得了最先进的性能,显著优于现有基线方法。