Light fields (LFs), conducive to comprehensive scene radiance recorded across angular dimensions, find wide applications in 3D reconstruction, virtual reality, and computational photography.However, the LF acquisition is inevitably time-consuming and resource-intensive due to the mainstream acquisition strategy involving manual capture or laborious software synthesis.Given such a challenge, we introduce LFdiff, a straightforward yet effective diffusion-based generative framework tailored for LF synthesis, which adopts only a single RGB image as input.LFdiff leverages disparity estimated by a monocular depth estimation network and incorporates two distinctive components: a novel condition scheme and a noise estimation network tailored for LF data.Specifically, we design a position-aware warping condition scheme, enhancing inter-view geometry learning via a robust conditional signal.We then propose DistgUnet, a disentanglement-based noise estimation network, to harness comprehensive LF representations.Extensive experiments demonstrate that LFdiff excels in synthesizing visually pleasing and disparity-controllable light fields with enhanced generalization capability.Additionally, comprehensive results affirm the broad applicability of the generated LF data, spanning applications like LF super-resolution and refocusing.
翻译:光场(LF)能够记录跨角度维度的完整场景辐射,在三维重建、虚拟现实和计算摄影等领域具有广泛应用。然而,由于主流采集策略涉及人工拍摄或繁琐的软件合成,光场获取往往耗时且资源密集。针对这一挑战,我们提出LFdiff——一种简洁高效的基于扩散的生成框架,专为光场合成设计,仅需单张RGB图像作为输入。LFdiff利用单目深度估计网络获取视差,并整合两个关键组件:面向LF数据的新型条件机制和噪声估计网络。具体而言,我们设计了位置感知的扭曲条件机制,通过鲁棒的条件信号增强视图间几何学习;随后提出基于解耦的噪声估计网络DistgUnet,以充分利用光场表征。大量实验表明,LFdiff在合成视觉悦目且视差可控的光场方面表现优异,并具备更强的泛化能力。此外,综合实验结果证实了生成光场数据在超分辨率、重聚焦等应用中的广泛适用性。