Face recognition datasets are often collected by crawling Internet and without individuals' consents, raising ethical and privacy concerns. Generating synthetic datasets for training face recognition models has emerged as a promising alternative. However, the generation of synthetic datasets remains challenging as it entails adequate inter-class and intra-class variations. While advances in generative models have made it easier to increase intra-class variations in face datasets (such as pose, illumination, etc.), generating sufficient inter-class variation is still a difficult task. In this paper, we formulate the dataset generation as a packing problem on the embedding space (represented on a hypersphere) of a face recognition model and propose a new synthetic dataset generation approach, called HyperFace. We formalize our packing problem as an optimization problem and solve it with a gradient descent-based approach. Then, we use a conditional face generator model to synthesize face images from the optimized embeddings. We use our generated datasets to train face recognition models and evaluate the trained models on several benchmarking real datasets. Our experimental results show that models trained with HyperFace achieve state-of-the-art performance in training face recognition using synthetic datasets.
翻译:人脸识别数据集通常通过爬取互联网收集,且未经个人同意,引发了伦理和隐私担忧。为训练人脸识别模型生成合成数据集已成为一种有前景的替代方案。然而,合成数据集的生成仍然具有挑战性,因为它需要足够的类间和类内差异。尽管生成模型的进步使得增加人脸数据集的类内差异(如姿态、光照等)更为容易,但生成充分的类间差异仍然是一项困难的任务。在本文中,我们将数据集生成问题形式化为在人脸识别模型的嵌入空间(表示为超球面)上的填充问题,并提出了一种名为HyperFace的新型合成数据集生成方法。我们将填充问题形式化为一个优化问题,并采用基于梯度下降的方法进行求解。随后,我们使用条件人脸生成器模型从优化后的嵌入中合成人脸图像。我们利用生成的数据集训练人脸识别模型,并在多个基准真实数据集上评估训练后的模型。实验结果表明,使用HyperFace训练的模型在利用合成数据集进行人脸识别训练方面达到了最先进的性能。