HumanAesExpert：面向人体图像美学评估的多模态基础模型进展 (HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment)

Image Aesthetic Assessment (IAA) is a long-standing and challenging research task. However, its subset, Human Image Aesthetic Assessment (HIAA), has been scarcely explored, even though HIAA is widely used in social media, AI workflows, and related domains. To bridge this research gap, our work pioneers a holistic implementation framework tailored for HIAA. Specifically, we introduce HumanBeauty, the first dataset purpose-built for HIAA, which comprises 108k high-quality human images with manual annotations. To achieve comprehensive and fine-grained HIAA, 50K human images are manually collected through a rigorous curation process and annotated leveraging our trailblazing 12-dimensional aesthetic standard, while the remaining 58K with overall aesthetic labels are systematically filtered from public datasets. Based on the HumanBeauty database, we propose HumanAesExpert, a powerful Vision Language Model for aesthetic evaluation of human images. We innovatively design an Expert head to incorporate human knowledge of aesthetic sub-dimensions while jointly utilizing the Language Modeling (LM) and Regression head. This approach empowers our model to achieve superior proficiency in both overall and fine-grained HIAA. Furthermore, we introduce a MetaVoter, which aggregates scores from all three heads, to effectively balance the capabilities of each head, thereby realizing improved assessment precision. Extensive experiments demonstrate that our HumanAesExpert models deliver significantly better performance in HIAA than other state-of-the-art models. Our datasets, models, and codes are publicly released to advance the HIAA community. Project webpage: https://humanaesexpert.github.io/HumanAesExpert/

翻译：图像美学评估是一项长期且具有挑战性的研究任务。然而，其子领域——人体图像美学评估尽管在社交媒体、人工智能工作流及相关领域应用广泛，却鲜有研究。为填补这一研究空白，我们的工作开创性地提出了一个专为人体图像美学评估设计的整体实现框架。具体而言，我们引入了首个为人体图像美学评估专门构建的数据集HumanBeauty，该数据集包含10.8万张高质量人体图像及人工标注。为实现全面且细粒度的人体图像美学评估，其中5万张人体图像通过严格的筛选流程人工收集，并采用我们开创性的12维美学标准进行标注；其余5.8万张带有整体美学标签的图像则从公开数据集中系统筛选而来。基于HumanBeauty数据库，我们提出了HumanAesExpert——一个用于人体图像美学评估的强大视觉语言模型。我们创新性地设计了一个专家头，以融入美学子维度的人类知识，同时联合使用语言建模头和回归头。该方法使我们的模型在整体和细粒度人体图像美学评估方面均展现出卓越能力。此外，我们引入了元投票器，通过聚合三个头的评分来有效平衡各头的能力，从而提升评估精度。大量实验表明，我们的HumanAesExpert模型在人体图像美学评估任务上显著优于其他最先进模型。我们的数据集、模型和代码均已公开发布，以推动人体图像美学评估领域的发展。项目主页：https://humanaesexpert.github.io/HumanAesExpert/

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日