Vision Transformer (ViT) has achieved excellent performance and demonstrated its promising potential in various computer vision tasks. The wide deployment of ViT in real-world tasks requires a thorough understanding of the societal impact of the model. However, most ViT-based works do not take fairness into account and it is unclear whether directly applying CNN-oriented debiased algorithm to ViT is feasible. Moreover, previous works typically sacrifice accuracy for fairness. Therefore, we aim to develop an algorithm that improves accuracy without sacrificing fairness. In this paper, we propose FairViT, a novel accurate and fair ViT framework. To this end, we introduce a novel distance loss and deploy adaptive fairness-aware masks on attention layers updating with model parameters. Experimental results show \sys can achieve accuracy better than other alternatives, even with competitive computational efficiency. Furthermore, \sys achieves appreciable fairness results.
翻译:视觉Transformer(ViT)在各种计算机视觉任务中取得了卓越的性能,并展现出广阔的应用前景。ViT在现实任务中的广泛部署要求我们深入理解模型的社会影响。然而,大多数基于ViT的研究并未考虑公平性问题,且直接将面向CNN的去偏算法应用于ViT的可行性尚不明确。此外,现有方法通常以牺牲准确性为代价来提升公平性。因此,我们的目标是开发一种在不损害公平性的前提下提升准确性的算法。本文提出FairViT——一种新颖的、兼具高准确性与公平性的ViT框架。为此,我们引入了一种新颖的距离损失函数,并在注意力层部署了随模型参数更新的自适应公平感知掩码。实验结果表明,\sys在保持计算效率竞争力的同时,其准确性优于其他替代方案。此外,\sys还取得了显著的公平性结果。