As responsible AI gains importance in machine learning algorithms, properties such as fairness, adversarial robustness, and causality have received considerable attention in recent years. However, despite their individual significance, there remains a critical gap in simultaneously exploring and integrating these properties. In this paper, we propose a novel approach that examines the relationship between individual fairness, adversarial robustness, and structural causal models in heterogeneous data spaces, particularly when dealing with discrete sensitive attributes. We use causal structural models and sensitive attributes to create a fair metric and apply it to measure semantic similarity among individuals. By introducing a novel causal adversarial perturbation and applying adversarial training, we create a new regularizer that combines individual fairness, causality, and robustness in the classifier. Our method is evaluated on both real-world and synthetic datasets, demonstrating its effectiveness in achieving an accurate classifier that simultaneously exhibits fairness, adversarial robustness, and causal awareness.
翻译:随着负责任人工智能在机器学习算法中的重要性日益凸显,公平性、对抗鲁棒性和因果性等属性近年来受到广泛关注。尽管这些特性各自具有重要价值,但如何同时探索并整合这些属性仍存在关键研究空白。本文提出一种新颖方法,在异质数据空间中(特别是处理离散敏感属性时)系统研究了个体公平性、对抗鲁棒性与结构因果模型之间的内在关联。我们利用因果结构模型与敏感属性构建公平度量,并以此衡量个体间的语义相似性。通过引入新型因果对抗扰动并应用对抗训练,我们构建了一个融合个体公平性、因果性和鲁棒性的新型正则化项,将其应用于分类器训练。在真实数据集与合成数据集上的实验表明,该方法能有效实现兼具公平性、对抗鲁棒性和因果感知的高精度分类器。