AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow focus limits the diversity of the data and consequently the robustness of AI systems trained on them. This work aims to address this limitation by proposing a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity. Specifically, our approach integrates a systematic prompt formulation strategy, encompassing not only demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories. These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images and can be used as an evaluation set in face analysis systems. Compared to existing datasets, our proposed dataset proves equally or more challenging in image classification tasks while being much smaller in size.
翻译:人工智能系统依赖大规模数据集的广泛训练来应对各类任务。然而,基于图像的系统,尤其是用于人口统计属性预测的系统,面临显著挑战。当前许多人脸图像数据集主要关注年龄、性别、肤色等人口统计特征,忽略了发型、配饰等关键面部属性。这种狭隘的聚焦限制了数据多样性,进而削弱了基于这些数据训练的AI系统的鲁棒性。本研究旨在通过提出一种生成合成人脸图像数据集的方法来突破这一局限,该数据集能够捕捉更广泛的面部多样性。具体而言,我们的方法整合了系统性提示词构建策略,不仅涵盖人口统计与生物特征,还纳入化妆、发型、配饰等非永久性特征。这些提示词引导最先进的文本到图像模型生成高质量逼真图像的综合数据集,并可作为人脸分析系统的评估集。与现有数据集相比,我们提出的数据集在图像分类任务中展现出同等甚至更高的挑战性,同时数据规模显著减小。