Federated Learning is an important emerging distributed training paradigm that keeps data private on clients. It is now well understood that by controlling only a small subset of FL clients, it is possible to introduce a backdoor to a federated learning model, in the presence of certain attributes. In this paper, we present a new type of attack that compromises the fairness of the trained model. Fairness is understood to be the attribute-level performance distribution of a trained model. It is particularly salient in domains where, for example, skewed accuracy discrimination between subpopulations could have disastrous consequences. We find that by employing a threat model similar to that of a backdoor attack, an attacker is able to influence the aggregated model to have an unfair performance distribution between any given set of attributes. Furthermore, we find that this attack is possible by controlling only a single client. While combating naturally induced unfairness in FL has previously been discussed in depth, its artificially induced kind has been neglected. We show that defending against attacks on fairness should be a critical consideration in any situation where unfairness in a trained model could benefit a user who participated in its training.
翻译:联邦学习是一种重要的新兴分布式训练范式,能够将数据保留在客户端以保护隐私。目前学界已充分认识到,通过仅控制一小部分联邦学习客户端,在特定属性存在的情况下,有可能向联邦学习模型植入后门。本文提出一种新型攻击方法,旨在破坏训练模型的公平性。公平性在此被定义为训练模型在属性层面的性能分布特征,这在某些领域尤为重要——例如当模型在不同子群体间出现严重准确性偏差时,可能导致灾难性后果。研究发现,采用与后门攻击相似的威胁模型,攻击者能够影响聚合模型,使其在任何给定属性集之间形成不公平的性能分布。更值得注意的是,仅需控制单个客户端即可实现此类攻击。尽管先前研究已深入探讨联邦学习中自然产生的公平性问题,但人为诱发的公平性破坏尚未得到足够重视。本研究证明,在训练模型的不公平性可能使参与训练者获益的任何场景中,防御针对公平性的攻击都应成为关键考量因素。