Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Code available in https://github.com/coulsonlee/STDO-CVPR2023.git

翻译：随着深度卷积神经网络（DNN）在计算机视觉各领域的广泛应用，利用DNN的过拟合能力实现视频分辨率提升已成为现代视频传输系统的新趋势。通过将视频划分为多个片段，并用超分辨率模型对每个片段进行过拟合，服务器在向客户端传输视频前对其进行编码，从而获得更优的视频质量与传输效率。然而，为确保良好的过拟合质量，通常需要大量视频片段，这大幅增加了存储开销，并消耗更多带宽资源用于数据传输。另一方面，通过训练优化技术减少片段数量通常要求较高的模型容量，从而显著降低执行速度。为解决这一矛盾，我们提出了一种面向高质量高效视频分辨率提升任务的新方法，该方法利用时空信息精准划分视频片段，从而将片段数量及模型规模控制在最小程度。此外，我们通过数据感知联合训练技术，将所提方法进一步发展为单一过拟合模型，在质量损失可忽略的情况下进一步降低存储需求。我们将模型部署于商用手机上，实验结果表明，本方法实现了高视频质量的实时超分辨率。与现有最优方法相比，本方法在实时视频分辨率提升任务中达到28帧/秒的流式传输速度与41.6的峰值信噪比，速度提升14倍且峰值信噪比提升2.29分贝。代码见 https://github.com/coulsonlee/STDO-CVPR2023.git

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日