Multi-task visual learning is a critical aspect of computer vision. Current research, however, predominantly concentrates on the multi-task dense prediction setting, which overlooks the intrinsic 3D world and its multi-view consistent structures, and lacks the capability for versatile imagination. In response to these limitations, we present a novel problem setting -- multi-task view synthesis (MTVS), which reinterprets multi-task prediction as a set of novel-view synthesis tasks for multiple scene properties, including RGB. To tackle the MTVS problem, we propose MuvieNeRF, a framework that incorporates both multi-task and cross-view knowledge to simultaneously synthesize multiple scene properties. MuvieNeRF integrates two key modules, the Cross-Task Attention (CTA) and Cross-View Attention (CVA) modules, enabling the efficient use of information across multiple views and tasks. Extensive evaluation on both synthetic and realistic benchmarks demonstrates that MuvieNeRF is capable of simultaneously synthesizing different scene properties with promising visual quality, even outperforming conventional discriminative models in various settings. Notably, we show that MuvieNeRF exhibits universal applicability across a range of NeRF backbones. Our code is available at https://github.com/zsh2000/MuvieNeRF.
翻译:多任务视觉学习是计算机视觉领域的关键研究方向。然而,现有研究主要聚焦于多任务密集预测设置,忽视了三维世界的内在本质及其多视角一致性结构,且缺乏灵活推理能力。针对上述局限,我们提出新的问题设定——多任务视图合成(MTVS),该设定将多任务预测重新解释为针对多种场景属性(包括RGB)的新视角合成任务。为解决MTVS问题,我们提出MuvieNeRF框架,该框架融合多任务与跨视图知识,能够同步合成多种场景属性。MuvieNeRF整合了跨任务注意力(CTA)与跨视图注意力(CVA)两个核心模块,实现多视图与多任务间信息的高效利用。在合成场景与真实场景基准上的广泛评估表明,MuvieNeRF能以优异的视觉质量同步合成不同场景属性,甚至在多项设定中超越传统判别模型。值得注意的是,MuvieNeRF展现出对多种NeRF主干模型的普适性。我们的代码已开源至https://github.com/zsh2000/MuvieNeRF。