Image Aesthetic Assessment (IAA) is a long-standing and challenging research task. However, its subset, Human Image Aesthetic Assessment (HIAA), has been scarcely explored, even though HIAA is widely used in social media, AI workflows, and related domains. To bridge this research gap, our work pioneers a holistic implementation framework tailored for HIAA. Specifically, we introduce HumanBeauty, the first dataset purpose-built for HIAA, which comprises 108k high-quality human images with manual annotations. To achieve comprehensive and fine-grained HIAA, 50K human images are manually collected through a rigorous curation process and annotated leveraging our trailblazing 12-dimensional aesthetic standard, while the remaining 58K with overall aesthetic labels are systematically filtered from public datasets. Based on the HumanBeauty database, we propose HumanAesExpert, a powerful Vision Language Model for aesthetic evaluation of human images. We innovatively design an Expert head to incorporate human knowledge of aesthetic sub-dimensions while jointly utilizing the Language Modeling (LM) and Regression head. This approach empowers our model to achieve superior proficiency in both overall and fine-grained HIAA. Furthermore, we introduce a MetaVoter, which aggregates scores from all three heads, to effectively balance the capabilities of each head, thereby realizing improved assessment precision. Extensive experiments demonstrate that our HumanAesExpert models deliver significantly better performance in HIAA than other state-of-the-art models. Our datasets, models, and codes are publicly released to advance the HIAA community. Project webpage: https://humanaesexpert.github.io/HumanAesExpert/
翻译:图像美学评估是一项长期且具有挑战性的研究任务。然而,其子领域——人体图像美学评估尽管在社交媒体、人工智能工作流及相关领域应用广泛,却鲜有研究。为填补这一研究空白,我们的工作开创性地提出了一个专为人体图像美学评估设计的整体实现框架。具体而言,我们引入了首个为人体图像美学评估专门构建的数据集HumanBeauty,该数据集包含10.8万张高质量人体图像及人工标注。为实现全面且细粒度的人体图像美学评估,其中5万张人体图像通过严格的筛选流程人工收集,并采用我们开创性的12维美学标准进行标注;其余5.8万张带有整体美学标签的图像则从公开数据集中系统筛选而来。基于HumanBeauty数据库,我们提出了HumanAesExpert——一个用于人体图像美学评估的强大视觉语言模型。我们创新性地设计了一个专家头,以融入美学子维度的人类知识,同时联合使用语言建模头和回归头。该方法使我们的模型在整体和细粒度人体图像美学评估方面均展现出卓越能力。此外,我们引入了元投票器,通过聚合三个头的评分来有效平衡各头的能力,从而提升评估精度。大量实验表明,我们的HumanAesExpert模型在人体图像美学评估任务上显著优于其他最先进模型。我们的数据集、模型和代码均已公开发布,以推动人体图像美学评估领域的发展。项目主页:https://humanaesexpert.github.io/HumanAesExpert/