Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks

Human facial data hold tremendous potential to address a variety of classification problems, including face recognition, age estimation, gender identification, emotion analysis, and race classification. However, recent privacy regulations, such as the EU General Data Protection Regulation and others, have restricted the ways in which human images may be collected and used for research. As a result, several previously published data sets containing human faces have been removed from the internet due to inadequate data collection methods that failed to meet privacy regulations. Data sets consisting of synthetic data have been proposed as an alternative, but they fall short of accurately representing the real data distribution. On the other hand, most available data sets are labeled for just a single task, which limits their applicability. To address these issues, we present the Multi-Task Faces (MTF) image data set, a meticulously curated collection of face images designed for various classification tasks, including face recognition, as well as race, gender, and age classification. The MTF data set has been ethically gathered by leveraging publicly available images of celebrities and strictly adhering to copyright regulations. In this paper, we present this data set and provide detailed descriptions of the followed data collection and processing procedures. Furthermore, we evaluate the performance of five deep learning (DL) models on the MTF data set across the aforementioned classification tasks. Additionally, we compare the performance of DL models over the processed MTF data and over raw data crawled from the internet. The reported results constitute a baseline for further research employing these data. The MTF data set can be accessed through the following link (please cite the present paper if you use the data set): https://github.com/RamiHaf/MTF_data_set

翻译：人类面部数据在解决多种分类问题（包括人脸识别、年龄估计、性别识别、情感分析和种族分类）中具有巨大潜力。然而，近期诸如欧盟《通用数据保护条例》等隐私法规限制了人类图像的收集及其在研究中的应用方式。因此，由于部分先前发布的人脸数据集所采用的收集方法不符合隐私法规，这些数据集已被从互联网上移除。合成数据集作为替代方案被提出，但其难以准确反映真实数据分布。此外，现有的大多数数据集仅针对单一任务进行标注，这限制了其适用性。为解决这些问题，我们提出了多任务人脸（MTF）图像数据集——一个经过精心整理的人脸图像集合，专为人脸识别、种族、性别和年龄分类等多种分类任务而设计。MTF数据集通过利用公开的知名人物图像并严格遵守版权法规以符合伦理的方式收集。本文介绍了该数据集，并详细描述了所遵循的数据收集与处理流程。此外，我们在上述分类任务上评估了五种深度学习模型在MTF数据集上的性能，并比较了深度学习模型在处理后的MTF数据与从互联网爬取的原始数据上的表现。所报告的结果为利用这些数据进行进一步研究提供了基准。MTF数据集可通过以下链接获取（若使用该数据集，请引用本文）：https://github.com/RamiHaf/MTF_data_set