Many client-side applications, especially games, render video at high resolution and frame rate on power-constrained devices, even when users perceive little or no benefit from all those extra pixels. Existing perceptual video quality metrics can indicate when a lower resolution is "good enough", but they are full-reference and computationally expensive, making them impractical for real-world applications and deployment on-device. In this work, we leverage the spatio-temporal limits of the human visual system and propose a non-reference method that predicts, from the rendered video alone, the lowest resolution that remains perceptually indistinguishable from the best available option, enabling power-efficient client-side rendering. Our approach is codec-agnostic and requires only minimal modifications to existing infrastructure. The network is trained on a large dataset of rendered content labeled with a full-reference perceptual video quality metric. The prediction significantly enhances perceptual quality while substantially reducing computational costs, suggesting a practical path toward perception-guided, power-efficient client-side rendering.
翻译:许多客户端应用(尤其是游戏)在功耗受限设备上以高分辨率和帧率渲染视频,即使这些额外像素对用户感知的提升微乎其微甚至毫无益处。现有的感知视频质量指标可判断何时较低分辨率已"足够好",但这些指标属于全参考类型且计算代价高昂,难以在真实场景中实际应用及部署于终端设备。本文利用人类视觉系统的时空感知极限,提出一种无参考方法——仅根据已渲染视频即可预测出在感知上与最佳可用选项无差异的最低分辨率,从而实现能效优化的客户端渲染。我们的方法不依赖特定编解码器,对现有基础设施仅需少量修改。通过采用全参考感知视频质量指标标注的大规模渲染内容数据集,对网络进行训练。该预测在显著降低计算开销的同时大幅提升感知质量,为感知引导的能效客户端渲染提供了可行方案。