SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes

AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow focus limits the diversity of the data and consequently the robustness of AI systems trained on them. This work aims to address this limitation by proposing a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity. Specifically, our approach integrates a systematic prompt formulation strategy, encompassing not only demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories. These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images and can be used as an evaluation set in face analysis systems. Compared to existing datasets, our proposed dataset proves equally or more challenging in image classification tasks while being much smaller in size.

翻译：人工智能系统依赖大规模数据集的广泛训练来应对各类任务。然而，基于图像的系统，尤其是用于人口统计属性预测的系统，面临显著挑战。当前许多人脸图像数据集主要关注年龄、性别、肤色等人口统计特征，忽略了发型、配饰等关键面部属性。这种狭隘的聚焦限制了数据多样性，进而削弱了基于这些数据训练的AI系统的鲁棒性。本研究旨在通过提出一种生成合成人脸图像数据集的方法来突破这一局限，该数据集能够捕捉更广泛的面部多样性。具体而言，我们的方法整合了系统性提示词构建策略，不仅涵盖人口统计与生物特征，还纳入化妆、发型、配饰等非永久性特征。这些提示词引导最先进的文本到图像模型生成高质量逼真图像的综合数据集，并可作为人脸分析系统的评估集。与现有数据集相比，我们提出的数据集在图像分类任务中展现出同等甚至更高的挑战性，同时数据规模显著减小。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日