Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment. To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetic information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms.
翻译:质量评估与美学评估旨在评价视觉内容的主观质量与美学价值。当前基于学习的方法受限于标注数据稀缺,通常在泛化性能上表现欠佳。尽管掩码图像建模已在分类、检测等高层视觉任务中取得显著进展,但本研究创新性地探究其在质量与美学感知方面的潜力。为此,我们提出质量-美学感知预训练框架QPT V2,这是首个基于掩码图像建模的统一质量与美学评估预训练方案。为兼顾高层语义与细粒度细节,我们精心构建了预训练数据;为全面覆盖质量与美学相关因素,我们引入了退化机制;为捕获多尺度质量与美学信息,我们对模型结构进行了改进。在11个下游基准上的大量实验结果表明,QPT V2相较于当前最先进方法及其他预训练范式均展现出显著优势。