SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes

AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow focus limits the diversity of the data and consequently the robustness of AI systems trained on them. This work aims to address this limitation by proposing a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity. Specifically, our approach integrates a systematic prompt formulation strategy, encompassing not only demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories. These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images and can be used as an evaluation set in face analysis systems. Compared to existing datasets, our proposed dataset proves equally or more challenging in image classification tasks while being much smaller in size.

翻译：人工智能系统依赖大规模数据集的广泛训练来解决各种任务。然而，基于图像的系统，特别是用于人口统计属性预测的系统，面临重大挑战。当前许多人脸图像数据集主要关注年龄、性别和肤色等人口统计因素，忽略了发型和配饰等其他关键面部属性。这种狭窄的关注限制了数据的多样性，从而降低了基于这些数据训练的AI系统的鲁棒性。本文旨在通过提出一种生成合成人脸图像数据集的方法来解决这一局限性，该数据集能够捕捉更广泛的面部多样性。具体而言，我们的方法整合了一种系统化的提示词生成策略，不仅涵盖人口统计学和生物特征，还包括妆容、发型和配饰等非永久特征。这些提示词指导最先进的文本到图像模型生成包含高质量逼真图像的综合数据集，并可作为人脸分析系统中的评估集。与现有数据集相比，我们提出的数据集在图像分类任务中表现出同等或更高的挑战性，同时规模小得多。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日