The Bj{\o}ntegaard Delta (BD) method proposed in 2001 has become a popular tool for comparing video codec compression efficiency. It was initially proposed to compute bitrate and quality differences between two Rate-Distortion curves using PSNR as a distortion metric. Over the years, many works have calculated and reported BD results using other objective quality metrics such as SSIM, VMAF and, in some cases, even subjective ratings (mean opinion scores). However, the lack of consolidated literature explaining the metric, its evolution over the years, and a systematic evaluation of the same under different test conditions can result in a wrong interpretation of the BD results thus obtained. Towards this end, this paper presents a detailed tutorial describing the BD method and example cases where the metric might fail. We also provide a detailed history of its evolution, including a discussion of various proposed improvements and variations over the last 20 years. In addition, we evaluate the various BD methods and their open-source implementations, considering different objective quality metrics and subjective ratings taking into account different RD characteristics. Based on our results, we present a set of recommendations on using existing BD metrics and various insights for possible exploration towards developing more effective tools for codec compression efficiency evaluation and comparison.
翻译:2001年提出的Bjøntegaard Delta(BD)方法已成为视频编解码器压缩效率对比的常用工具。该方法最初以PSNR作为失真度量,通过计算两条率失真曲线之间的码率与质量差异来实现评估。多年来,许多研究采用其他客观质量指标(如SSIM、VMAF)甚至主观评分(平均意见分数)进行BD结果计算与报告。然而,由于缺乏系统文献对该指标的定义、历年演变过程的解释,以及在不同测试条件下对其进行标准化评估,可能导致对BD计算结果的错误解读。为此,本文提供了详细的BD方法教程,阐述了该指标可能失效的典型案例。我们系统梳理了其二十余年的演变历史,包括各类改进方案与变体的讨论。此外,我们综合评估了多种BD方法及其开源实现,针对不同客观质量指标、主观评分及率失真特性进行对比。基于实验结果,我们提出了现有BD指标的使用建议,并为开发更高效的编解码器压缩效率评估与对比工具提供了探索方向。