Image aesthetics assessment (IAA) aims to estimate the aesthetics of images. Depending on the content of an image, diverse criteria need to be selected to assess its aesthetics. Existing works utilize pre-trained vision backbones based on content knowledge to learn image aesthetics. However, training those backbones is time-consuming and suffers from attention dispersion. Inspired by learnable queries in vision-language alignment, we propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach. It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder. Extensive experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
翻译:图像美学评估(IAA)旨在估计图像的美学价值。根据图像内容的不同,需选择多样化的标准来评估其美学质量。现有工作利用基于内容知识的预训练视觉骨干网络来学习图像美学,但训练这些骨干网络耗时且易出现注意力分散问题。受视觉-语言对齐中可学习查询的启发,我们提出基于可学习查询的图像美学评估方法(IAA-LQ),该方法通过可学习查询从冻结图像编码器获取的预训练图像特征中提取美学特征。在真实数据上的大量实验表明,IAA-LQ在SRCC和PLCC指标上分别以2.2%和2.1%的优势超越当前最优方法,充分展现了其优越性。