AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow focus limits the diversity of the data and consequently the robustness of AI systems trained on them. This work aims to address this limitation by proposing a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity. Specifically, our approach integrates a systematic prompt formulation strategy, encompassing not only demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories. These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images and can be used as an evaluation set in face analysis systems. Compared to existing datasets, our proposed dataset proves equally or more challenging in image classification tasks while being much smaller in size.
翻译:人工智能系统依赖大规模数据集的广泛训练来解决各种任务。然而,基于图像的系统,特别是用于人口统计属性预测的系统,面临重大挑战。当前许多人脸图像数据集主要关注年龄、性别和肤色等人口统计因素,忽略了发型和配饰等其他关键面部属性。这种狭窄的关注限制了数据的多样性,从而降低了基于这些数据训练的AI系统的鲁棒性。本文旨在通过提出一种生成合成人脸图像数据集的方法来解决这一局限性,该数据集能够捕捉更广泛的面部多样性。具体而言,我们的方法整合了一种系统化的提示词生成策略,不仅涵盖人口统计学和生物特征,还包括妆容、发型和配饰等非永久特征。这些提示词指导最先进的文本到图像模型生成包含高质量逼真图像的综合数据集,并可作为人脸分析系统中的评估集。与现有数据集相比,我们提出的数据集在图像分类任务中表现出同等或更高的挑战性,同时规模小得多。