We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end. To ensure fair and leakage-free evaluation, we introduce an end-to-end protocol that excludes any ground truth or proxy geometry estimates at test time. The protocol emphasizes metrics reflecting downstream relevance: iSQoE for perceptual comfort and MEt3R for geometric consistency. StereoSpace surpasses other methods from the warp & inpaint, latent-warping, and warped-conditioning categories, achieving sharp parallax and strong robustness on layered and non-Lambertian scenes. This establishes viewpoint-conditioned diffusion as a scalable, depth-free solution for stereo generation.
翻译:摘要:我们提出StereoSpace,一种基于扩散模型的单目到立体合成框架,该框架纯粹通过视点条件建模几何,无需显式深度或扭曲操作。规范校正空间与条件引导生成器端到端地推断对应关系并填充遮挡区域。为确保评估的公平性与无信息泄漏,我们引入了一种端到端评估协议,该协议在测试时排除任何真实数据或代理几何估计。该协议强调反映下游应用相关性的指标:用于感知舒适度的iSQoE和用于几何一致性的MEt3R。StereoSpace超越了扭曲与修复、潜在扭曲以及扭曲条件类别的其他方法,在分层和非朗伯场景下实现了锐利的视差和强大的鲁棒性。这确立了视点条件扩散作为可扩展、无深度解决方案在立体生成领域的地位。