Vector-quantized based models have recently demonstrated strong potential for visual prior modeling. However, existing VQ-based methods simply encode visual features with nearest codebook items and train index predictor with code-level supervision. Due to the richness of visual signal, VQ encoding often leads to large quantization error. Furthermore, training predictor with code-level supervision can not take the final reconstruction errors into consideration, result in sub-optimal prior modeling accuracy. In this paper we address the above two issues and propose a Texture Vector-Quantization and a Reconstruction Aware Prediction strategy. The texture vector-quantization strategy leverages the task character of super-resolution and only introduce codebook to model the prior of missing textures. While the reconstruction aware prediction strategy makes use of the straight-through estimator to directly train index predictor with image-level supervision. Our proposed generative SR model (TVQ&RAP) is able to deliver photo-realistic SR results with small computational cost.
翻译:基于矢量量化的模型近年来在视觉先验建模方面展现出强大潜力。然而,现有VQ方法仅通过最近码本项编码视觉特征,并利用代码级监督训练索引预测器。由于视觉信号的丰富性,VQ编码常导致较大的量化误差。此外,采用代码级监督训练预测器无法将最终重建误差纳入考量,造成先验建模精度欠佳。本文针对上述两个问题,提出纹理矢量量化与重建感知预测策略。纹理矢量量化策略利用超分辨率任务特性,仅引入码本对缺失纹理的先验进行建模;而重建感知预测策略则利用直通估计器,以图像级监督直接训练索引预测器。我们所提出的生成式SR模型(TVQ&RAP)能够以较小计算成本生成逼真的超分辨率结果。