Deep neural networks are vulnerable to adversarial perturbations that can simultaneously degrade prediction robustness and individual fairness across diverse application settings. However, existing evaluation protocols typically assess these dimensions in isolation, thereby obscuring critical failure modes. To bridge this gap, we formalize Robust Individual Fairness (RIF): under semantic-preserving (truth-condition-preserving) perturbations, predictions should remain both correct with respect to the ground truth and invariant across semantically equivalent individuals. To surface RIF violations in practice, we introduce RIFair, a black-box adversarial framework that leverages a decoupled perturbation strategy to construct semantically preserved yet unrobust and/or unfair instance pairs. Experiments across multiple model architectures and real-world textual datasets show that robustness-only or fairness-only metrics often miss Robust Biased and Unrobust Fair behaviors. RIFair}reliably exposes these hidden vulnerabilities, supporting RIF as a necessary criterion for trustworthy model assessment. The experimental code is publicly available at https://github.com/Xuran-LI/RIFair.
翻译:深度神经网络容易受到对抗性扰动的影响,这种扰动在各种应用场景中会同时降低预测鲁棒性和个体公平性。然而,现有的评估协议通常孤立地评估这些维度,从而掩盖了关键的失效模式。为弥补这一差距,我们形式化定义了鲁棒个体公平性(RIF):在保持语义(保持真实条件)的扰动下,预测结果应既相对于真实标签保持正确,又在语义等价的个体间保持不变。为了在实践中暴露RIF的违反情况,我们提出了RIFair——一个黑盒对抗框架,它利用解耦扰动策略构建语义保持但缺乏鲁棒性和/或公平性的实例对。在多种模型架构和真实世界文本数据集上的实验表明,仅关注鲁棒性或仅关注公平性的度量标准往往容易遗漏鲁棒性偏差行为和欠鲁棒公平行为。RIFair能够可靠地揭示这些隐藏的脆弱性,从而支持将RIF作为可信模型评估的必要标准。实验代码已开源在https://github.com/Xuran-LI/RIFair。