Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Our codes are available at: https://github.com/coulsonlee/STDO-CVPR2023.git

翻译：随着深度卷积神经网络在计算机视觉各领域的广泛应用，利用DNN的过拟合能力实现视频分辨率提升已成为现代视频传输系统的新趋势。通过将视频分割为分块并用超分辨率模型对每个分块进行过拟合，服务器在传输前对视频进行编码，从而获得更优的视频质量与传输效率。然而，为确保良好的过拟合质量，通常需要大量分块，这显著增加了存储需求并消耗更多带宽资源。另一方面，通过训练优化技术减少分块数量通常需要高模型容量，这会大幅降低执行速度。为协调这一矛盾，我们提出了一种面向高质量高效视频分辨率提升任务的新方法，该方法利用时空信息精确划分视频分块，从而在保持最少分块数量和最小模型尺寸的同时实现目标。此外，我们通过数据感知的联合训练技术将方法推进为单一过拟合模型，进一步降低了存储需求且质量下降可忽略不计。我们在现成手机上部署模型，实验结果表明，该方法实现了高视频质量的实时超分辨率。与现有最优方法相比，本方法在实时视频分辨率提升任务中达到28 fps的流式传输速度和41.6 PSNR，速度提升14倍，质量提升2.29 dB。我们的代码已开源：https://github.com/coulsonlee/STDO-CVPR2023.git

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日