Despite the success of deep-learning models in many tasks, there have been concerns about such models learning shortcuts, and their lack of robustness to irrelevant confounders. When it comes to models directly trained on human faces, a sensitive confounder is that of human identities. Many face-related tasks should ideally be identity-independent, and perform uniformly across different individuals (i.e. be fair). One way to measure and enforce such robustness and performance uniformity is through enforcing it during training, assuming identity-related information is available at scale. However, due to privacy concerns and also the cost of collecting such information, this is often not the case, and most face datasets simply contain input images and their corresponding task-related labels. Thus, improving identity-related robustness without the need for such annotations is of great importance. Here, we explore using face-recognition embedding vectors, as proxies for identities, to enforce such robustness. We propose to use the structure in the face-recognition embedding space, to implicitly emphasize rare samples within each class. We do so by weighting samples according to their conditional inverse density (CID) in the proxy embedding space. Our experiments suggest that such a simple sample weighting scheme, not only improves the training robustness, it often improves the overall performance as a result of such robustness. We also show that employing such constraints during training results in models that are significantly less sensitive to different levels of bias in the dataset.
翻译:尽管深度学习模型在许多任务中取得了成功,但人们仍担忧此类模型会学习捷径,且对无关混杂因素缺乏鲁棒性。当模型直接基于人脸进行训练时,一个敏感的混杂因素是人类身份。许多与人脸相关的任务在理想情况下应独立于身份,并在不同个体间表现一致(即公平性)。衡量并实现这种鲁棒性与性能一致性的方法之一,是在训练过程中强制施加约束,前提是身份相关信息能够大规模获取。然而,出于隐私担忧及采集此类信息的成本,实际情况往往并非如此——大多数人脸数据集仅包含输入图像及其对应的任务相关标签。因此,无需此类标注即可提升身份相关鲁棒性显得至关重要。本文探索使用人脸识别嵌入向量作为身份代理来强制实现这种鲁棒性。我们提出利用人脸识别嵌入空间的结构,隐式强调每个类别中的稀有样本。具体通过根据代理嵌入空间中的条件逆密度对样本进行加权实现。实验表明,这种简单的样本加权方案不仅提升了训练鲁棒性,还因鲁棒性的增强而整体优化了模型性能。我们还证明,在训练过程中施加此类约束,能使模型对数据集中不同水平的偏差显著降低敏感性。