As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Code available in https://github.com/coulsonlee/STDO-CVPR2023.git
翻译:随着深度卷积神经网络(DNN)在计算机视觉各领域的广泛应用,利用DNN的过拟合能力实现视频分辨率提升已成为现代视频传输系统的新趋势。通过将视频划分为多个片段,并用超分辨率模型对每个片段进行过拟合,服务器在向客户端传输视频前对其进行编码,从而获得更优的视频质量与传输效率。然而,为确保良好的过拟合质量,通常需要大量视频片段,这大幅增加了存储开销,并消耗更多带宽资源用于数据传输。另一方面,通过训练优化技术减少片段数量通常要求较高的模型容量,从而显著降低执行速度。为解决这一矛盾,我们提出了一种面向高质量高效视频分辨率提升任务的新方法,该方法利用时空信息精准划分视频片段,从而将片段数量及模型规模控制在最小程度。此外,我们通过数据感知联合训练技术,将所提方法进一步发展为单一过拟合模型,在质量损失可忽略的情况下进一步降低存储需求。我们将模型部署于商用手机上,实验结果表明,本方法实现了高视频质量的实时超分辨率。与现有最优方法相比,本方法在实时视频分辨率提升任务中达到28帧/秒的流式传输速度与41.6的峰值信噪比,速度提升14倍且峰值信噪比提升2.29分贝。代码见 https://github.com/coulsonlee/STDO-CVPR2023.git