Quality assessment of images and videos emphasizes both local details and global semantics, whereas general data sampling methods (e.g., resizing, cropping or grid-based fragment) fail to catch them simultaneously. To address the deficiency, current approaches have to adopt multi-branch models and take as input the multi-resolution data, which burdens the model complexity. In this work, instead of stacking up models, a more elegant data sampling method (named as SAMA, scaling and masking) is explored, which compacts both the local and global content in a regular input size. The basic idea is to scale the data into a pyramid first, and reduce the pyramid into a regular data dimension with a masking strategy. Benefiting from the spatial and temporal redundancy in images and videos, the processed data maintains the multi-scale characteristics with a regular input size, thus can be processed by a single-branch model. We verify the sampling method in image and video quality assessment. Experiments show that our sampling method can improve the performance of current single-branch models significantly, and achieves competitive performance to the multi-branch models without extra model complexity. The source code will be available at https://github.com/Sissuire/SAMA.
翻译:图像与视频的质量评估需要兼顾局部细节与全局语义,而通用数据采样方法(如缩放、裁剪或基于网格的碎片化)难以同时捕获这两类信息。为解决此缺陷,现有方法不得不采用多分支模型并输入多分辨率数据,这增加了模型复杂度。本文探索了一种更优雅的数据采样方法(命名为SAMA,即缩放与掩码),而非堆叠模型,该方法将局部与全局内容压缩到常规输入尺寸中。其核心思想是先将数据缩放为金字塔结构,再通过掩码策略将金字塔降维至常规数据维度。得益于图像与视频中的空间与时间冗余性,处理后的数据以常规输入尺寸保持多尺度特征,因此可通过单分支模型处理。我们将该采样方法应用于图像与视频质量评估,实验表明:该方法能显著提升当前单分支模型的性能,且在不增加模型复杂度的前提下达到与多分支模型相媲美的效果。源代码将发布在 https://github.com/Sissuire/SAMA。