Sy-FAR: Symmetry-based Fair Adversarial Robustness

Security-critical machine-learning (ML) systems, such as face-recognition systems, are susceptible to adversarial examples, including real-world physically realizable attacks. Various means to boost ML's adversarial robustness have been proposed; however, they typically induce unfair robustness: It is often easier to attack from certain classes or groups than from others. Several techniques have been developed to improve adversarial robustness while seeking perfect fairness between classes. Yet, prior work has focused on settings where security and fairness are less critical. Our insight is that achieving perfect parity in realistic fairness-critical tasks, such as face recognition, is often infeasible -- some classes may be highly similar, leading to more misclassifications between them. Instead, we suggest that seeking symmetry -- i.e., attacks from class $i$ to $j$ would be as successful as from $j$ to $i$ -- is more tractable. Intuitively, symmetry is a desirable because class resemblance is a symmetric relation in most domains. Additionally, as we prove theoretically, symmetry between individuals induces symmetry between any set of sub-groups, in contrast to other fairness notions where group-fairness is often elusive. We develop Sy-FAR, a technique to encourage symmetry while also optimizing adversarial robustness and extensively evaluate it using five datasets, with three model architectures, including against targeted and untargeted realistic attacks. The results show Sy-FAR significantly improves fair adversarial robustness compared to state-of-the-art methods. Moreover, we find that Sy-FAR is faster and more consistent across runs. Notably, Sy-FAR also ameliorates another type of unfairness we discover in this work -- target classes that adversarial examples are likely to be classified into become significantly less vulnerable after inducing symmetry.

翻译：安全攸关的机器学习系统（如人脸识别系统）易受对抗样本攻击，包括现实世界中可物理实现的攻击。已有多种方法被提出以增强机器学习的对抗鲁棒性；然而，这些方法通常会导致不公平的鲁棒性：攻击某些类别或群体往往比其他类别更容易。目前已有若干技术在寻求类别间完全公平的同时提升对抗鲁棒性。然而，先前工作主要关注安全性和公平性要求较低的场景。我们的核心见解是，在现实公平性关键任务（如人脸识别）中实现完全均等往往不可行——某些类别可能高度相似，导致其间更易发生误分类。相反，我们提出寻求对称性——即从类别$i$到$j$的攻击成功率应与从$j$到$i$的攻击成功率相当——是更可行的路径。直观而言，对称性是理想特性，因为在大多数领域中类别相似性是对称关系。此外，正如我们在理论上证明的，个体间的对称性会诱导任意子集群体间的对称性，这与群体公平性通常难以实现的其他公平性概念形成对比。我们开发了Sy-FAR技术，在优化对抗鲁棒性的同时促进对称性，并使用五个数据集、三种模型架构（包括针对定向与非定向的现实攻击）进行了广泛评估。结果表明，与最先进方法相比，Sy-FAR显著提升了公平对抗鲁棒性。此外，我们发现Sy-FAR具有更快的训练速度和更高的运行一致性。值得注意的是，Sy-FAR还改善了我们在本工作中发现的另一种不公平性——在诱导对称性后，对抗样本易被误分类的目标类别其脆弱性显著降低。