Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.
翻译:神经视图合成(NVS)是实现自由视角视频合成最成功的技术之一,仅需少量稀疏采集的图像即可生成高保真结果。这一成功催生了该技术的众多变体,每种方法通常使用PSNR、SSIM或LPIPS等图像质量指标在测试视角集上进行评估。然而,关于NVS方法在感知视频质量方面的表现仍缺乏研究。本文首次开展了针对NVS和NeRF变体的感知评估研究。为此,我们收集了两种场景数据集:受控实验室环境和野外真实场景。与现有数据集不同,这些场景包含参考视频序列,从而能够检测仅观察静态图像时容易被忽视的时间伪影和细微失真。我们在严格控制的感知质量评估实验中,结合多种现有最优图像/视频质量指标,测量了多种NVS方法合成的视频质量。基于分析结果,我们提出了详细的结论,并为NVS评估中数据集与指标的选择提供了建议。