On the Choice of Perception Loss Function for Learned Video Compression

We study causal, low-latency, sequential video compression when the output is subjected to both a mean squared-error (MSE) distortion loss as well as a perception loss to target realism. Motivated by prior approaches, we consider two different perception loss functions (PLFs). The first, PLF-JD, considers the joint distribution (JD) of all the video frames up to the current one, while the second metric, PLF-FMD, considers the framewise marginal distributions (FMD) between the source and reconstruction. Using information theoretic analysis and deep-learning based experiments, we demonstrate that the choice of PLF can have a significant effect on the reconstruction, especially at low-bit rates. In particular, while the reconstruction based on PLF-JD can better preserve the temporal correlation across frames, it also imposes a significant penalty in distortion compared to PLF-FMD and further makes it more difficult to recover from errors made in the earlier output frames. Although the choice of PLF decisively affects reconstruction quality, we also demonstrate that it may not be essential to commit to a particular PLF during encoding and the choice of PLF can be delegated to the decoder. In particular, encoded representations generated by training a system to minimize the MSE (without requiring either PLF) can be {\em near universal} and can generate close to optimal reconstructions for either choice of PLF at the decoder. We validate our results using (one-shot) information-theoretic analysis, detailed study of the rate-distortion-perception tradeoff of the Gauss-Markov source model as well as deep-learning based experiments on moving MNIST and KTH datasets.

翻译：我们研究当输出同时受均方误差（MSE）失真损失和针对真实感的感知损失约束时，因果性、低延迟的序列视频压缩问题。受先前方法启发，我们考虑两种不同的感知损失函数（PLF）。第一种PLF-JD考虑当前帧之前所有视频帧的联合分布（JD），而第二种PLF-FMD则考虑源与重建之间的逐帧边缘分布（FMD）。通过信息论分析与基于深度学习的实验，我们证明感知损失函数的选择会对重建效果产生显著影响，尤其在低码率条件下。具体而言，基于PLF-JD的重建虽能更好地保留帧间时间相关性，但相较于PLF-FMD会引入显著的失真代价，且更难从前序输出帧的误差中恢复。尽管感知损失函数的选择对重建质量有决定性影响，但我们同样证明在编码过程中无需固定采用特定PLF，该选择可交由解码器处理。特别地，通过最小化MSE训练系统生成的编码表示（无需依赖任一PLF）具有近乎通用性，可在解码端为两种PLF选择生成接近最优的重建结果。我们通过一次性信息论分析、高斯-马尔可夫源模型的率失真感知权衡研究，以及基于移动MNIST和KTH数据集的深度学习实验验证了上述结论。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日