We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end. To ensure fair and leakage-free evaluation, we introduce an end-to-end protocol that excludes any ground truth or proxy geometry estimates at test time. The protocol emphasizes metrics reflecting downstream relevance: iSQoE for perceptual comfort and MEt3R for geometric consistency. StereoSpace surpasses other methods from the warp & inpaint, latent-warping, and warped-conditioning categories, achieving sharp parallax and strong robustness on layered and non-Lambertian scenes. This establishes viewpoint-conditioned diffusion as a scalable, depth-free solution for stereo generation.
翻译:摘要:我们提出StereoSpace,一种基于扩散的从单视图到立体视图的合成框架,该框架仅通过视点条件对几何进行建模,无需显式的深度或图像扭曲。通过规范校正空间与视点条件的引导,生成器能够端到端地推断对应关系并填补去遮挡区域。为确保公平且无数据泄漏的评估,我们引入了一种端到端评估协议,该协议在测试阶段排除任何真实值或代理几何估计。该协议强调反映下游相关性的指标:用于感知舒适度的iSQoE与用于几何一致性的MEt3R。StereoSpace超越了基于“扭曲与修补”、“潜在空间扭曲”以及“扭曲条件”等类别的其他方法,在分层与非朗伯场景中实现了锐利的视差与强鲁棒性。这确立了基于视点条件的扩散方法作为一种可扩展、无深度依赖的立体生成解决方案。