Real-time free-viewpoint rendering requires balancing multi-camera redundancy with the latency constraints of interactive applications. We address this challenge by combining lightweight geometry with learning and propose 3DTV, a feedforward network for real-time sparse-view interpolation. A Delaunay-based triplet selection ensures angular coverage for each target view. Building on this, we introduce a pose-aware depth module that estimates a coarse-to-fine depth pyramid, enabling efficient feature reprojection and occlusion-aware blending. Unlike methods that require scene-specific optimization, 3DTV runs feedforward without retraining, making it practical for AR/VR, telepresence, and interactive applications. Our experiments on challenging multi-view video datasets demonstrate that 3DTV consistently achieves a strong balance of quality and efficiency, outperforming recent real-time novel-view baselines. Crucially, 3DTV avoids explicit proxies, enabling robust rendering across diverse scenes. This makes it a practical solution for low-latency multi-view streaming and interactive rendering. Project Page: https://stefanmschulz.github.io/3DTV_webpage/
翻译:实时自由视点渲染需要在多摄像机冗余与交互式应用的延迟约束之间取得平衡。我们通过将轻量几何与学习相结合来应对这一挑战,并提出了3DTV,一种用于实时稀疏视图插值的前馈网络。基于德劳内三角化的三元组选择策略确保了每个目标视图的角覆盖范围。在此基础上,我们引入了一个姿态感知深度模块,该模块可估计由粗到细的深度金字塔,从而实现高效的特征重投影和遮挡感知融合。与需要场景特定优化的方法不同,3DTV无需重新训练即可进行前馈运行,使其适用于AR/VR、远程临场和交互式应用。我们在具有挑战性的多视图视频数据集上的实验表明,3DTV始终在质量和效率之间实现强平衡,优于近期实时新视角基线方法。关键在于,3DTV避免了显式代理,从而能够跨多样场景实现稳健渲染。这使其成为低延迟多视图流传输和交互式渲染的实用解决方案。项目页面:https://stefanmschulz.github.io/3DTV_webpage/