Video frame interpolation is a fundamental tool for temporal video enhancement, but existing quality metrics struggle to evaluate the perceptual impact of interpolation artefacts effectively. Metrics like PSNR, SSIM and LPIPS ignore temporal coherence. State-of-the-art quality metrics tailored towards video frame interpolation, like FloLPIPS, have been developed but suffer from computational inefficiency that limits their practical application. We present $\text{PSNR}_{\text{DIV}}$, a novel full-reference quality metric that enhances PSNR through motion divergence weighting, a technique adapted from archival film restoration where it was developed to detect temporal inconsistencies. Our approach highlights singularities in motion fields which is then used to weight image errors. Evaluation on the BVI-VFI dataset (180 sequences across multiple frame rates, resolutions and interpolation methods) shows $\text{PSNR}_{\text{DIV}}$ achieves statistically significant improvements: +0.09 Pearson Linear Correlation Coefficient over FloLPIPS, while being 2.5$\times$ faster and using 4$\times$ less memory. Performance remains consistent across all content categories and are robust to the motion estimator used. The efficiency and accuracy of $\text{PSNR}_{\text{DIV}}$ enables fast quality evaluation and practical use as a loss function for training neural networks for video frame interpolation tasks. An implementation of our metric is available at www.github.com/conalld/psnr-div.
翻译:视频帧插值是时间维度视频增强的基础工具,但现有质量度量方法难以有效评估插值伪影的感知影响。PSNR、SSIM和LPIPS等度量指标忽略了时间连贯性。虽然已开发出针对视频帧插值的前沿质量度量方法(如FloLPIPS),但其计算效率低下限制了实际应用。本文提出$\text{PSNR}_{\text{DIV}}$——一种通过运动散度加权增强PSNR的新型全参考质量度量方法,该技术改编自档案胶片修复领域,原用于检测时间不一致性。我们的方法通过凸显运动场中的奇异性来加权图像误差。在BVI-VFI数据集(包含多帧率、多分辨率和多种插值方法的180个序列)上的评估表明,$\text{PSNR}_{\text{DIV}}$实现了统计显著性提升:相较于FloLPIPS获得+0.09的皮尔逊线性相关系数提升,同时计算速度提升2.5倍,内存占用减少4倍。该度量在所有内容类别中均保持稳定性能,且对所用运动估计器具有鲁棒性。$\text{PSNR}_{\text{DIV}}$的高效性与准确性使其能够快速进行质量评估,并可作为损失函数实际应用于视频帧插值任务的神经网络训练。本度量方法的实现已发布于www.github.com/conalld/psnr-div。