Many client-side applications, especially games, render video at high resolution and frame rate on power-constrained devices, even when users perceive little or no benefit from all those extra pixels. Existing perceptual video quality metrics can indicate when a lower resolution is "good enough", but they are full-reference and computationally expensive, making them impractical for real-world applications and deployment on-device. In this work, we leverage the spatio-temporal limits of the human visual system and propose a non-reference method that predicts, from the rendered video alone, the lowest resolution that remains perceptually indistinguishable from the best available option, enabling power-efficient client-side rendering. Our approach is codec-agnostic and requires only minimal modifications to existing infrastructure. The network is trained on a large dataset of rendered content labeled with a full-reference perceptual video quality metric. The prediction significantly enhances perceptual quality while substantially reducing computational costs, suggesting a practical path toward perception-guided, power-efficient client-side rendering.
翻译:许多客户端应用(尤其是游戏)在功耗受限设备上以高分辨率和帧率渲染视频,即便用户从这些额外像素中几乎感受不到任何益处。现有的感知视频质量指标虽能指示何时较低分辨率已"足够好",但这些指标均需参考完整信息且计算成本高昂,难以在真实场景应用及设备端部署。本研究利用人类视觉系统的时空极限,提出一种无需参考的方法——仅根据渲染后的视频预测在感知上仍与最佳可用选项无差异的最低分辨率,从而实现客户端高效能渲染。该方法与编码格式无关,仅需对现有基础设施进行极小修改。网络采用包含完整参考感知视频质量指标标注的大规模渲染内容数据集训练。该预测能在显著提升感知质量的同时大幅降低计算成本,为感知引导的客户端高效能渲染提供了可行路径。