HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere

Face recognition datasets are often collected by crawling Internet and without individuals' consents, raising ethical and privacy concerns. Generating synthetic datasets for training face recognition models has emerged as a promising alternative. However, the generation of synthetic datasets remains challenging as it entails adequate inter-class and intra-class variations. While advances in generative models have made it easier to increase intra-class variations in face datasets (such as pose, illumination, etc.), generating sufficient inter-class variation is still a difficult task. In this paper, we formulate the dataset generation as a packing problem on the embedding space (represented on a hypersphere) of a face recognition model and propose a new synthetic dataset generation approach, called HyperFace. We formalize our packing problem as an optimization problem and solve it with a gradient descent-based approach. Then, we use a conditional face generator model to synthesize face images from the optimized embeddings. We use our generated datasets to train face recognition models and evaluate the trained models on several benchmarking real datasets. Our experimental results show that models trained with HyperFace achieve state-of-the-art performance in training face recognition using synthetic datasets.

翻译：人脸识别数据集通常通过爬取互联网收集，且未经个人同意，引发了伦理和隐私担忧。为训练人脸识别模型生成合成数据集已成为一种有前景的替代方案。然而，合成数据集的生成仍然具有挑战性，因为它需要足够的类间和类内差异。尽管生成模型的进步使得增加人脸数据集的类内差异（如姿态、光照等）更为容易，但生成充分的类间差异仍然是一项困难的任务。在本文中，我们将数据集生成问题形式化为在人脸识别模型的嵌入空间（表示为超球面）上的填充问题，并提出了一种名为HyperFace的新型合成数据集生成方法。我们将填充问题形式化为一个优化问题，并采用基于梯度下降的方法进行求解。随后，我们使用条件人脸生成器模型从优化后的嵌入中合成人脸图像。我们利用生成的数据集训练人脸识别模型，并在多个基准真实数据集上评估训练后的模型。实验结果表明，使用HyperFace训练的模型在利用合成数据集进行人脸识别训练方面达到了最先进的性能。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日