As digital transformation continues, enterprises are generating, managing, and storing vast amounts of data, while artificial intelligence technology is rapidly advancing. However, it brings challenges in information security and data security. Data security refers to the protection of digital information from unauthorized access, damage, theft, etc. throughout its entire life cycle. With the promulgation and implementation of data security laws and the emphasis on data security and data privacy by organizations and users, Privacy-preserving technology represented by federated learning has a wide range of application scenarios. Federated learning is a distributed machine learning computing framework that allows multiple subjects to train joint models without sharing data to protect data privacy and solve the problem of data islands. However, the data among multiple subjects are independent of each other, and the data differences in quality may cause fairness issues in federated learning modeling, such as data bias among multiple subjects, resulting in biased and discriminatory models. Therefore, we propose DBFed, a debiasing federated learning framework based on domain-independent, which mitigates model bias by explicitly encoding sensitive attributes during client-side training. This paper conducts experiments on three real datasets and uses five evaluation metrics of accuracy and fairness to quantify the effect of the model. Most metrics of DBFed exceed those of the other three comparative methods, fully demonstrating the debiasing effect of DBFed.
翻译:随着数字化转型的持续推进,企业产生、管理和存储着海量数据,同时人工智能技术也在飞速发展。然而,这带来了信息安全与数据安全的挑战。数据安全是指在数字信息全生命周期内,防止其遭受未授权访问、破坏、窃取等风险。随着数据安全法规的颁布实施,以及组织与用户对数据安全及数据隐私的日益重视,以联邦学习为代表的隐私保护技术具有广泛的应用场景。联邦学习是一种分布式机器学习计算框架,允许多个参与方在不共享数据的情况下联合训练模型,从而保护数据隐私并解决数据孤岛问题。然而,多个参与方之间的数据相互独立,且数据质量差异可能导致联邦学习建模中的公平性问题,例如多方数据偏差进而产生有偏和歧视性的模型。为此,本文提出DBFed——一种基于领域无关的联邦学习去偏框架,通过客户端训练过程中显式编码敏感属性来缓解模型偏差。本文在三个真实数据集上开展实验,采用准确率和公平性共五项评估指标量化模型效果,DBFed在多数指标上优于其他三种比较方法,充分验证了其去偏效果。