In Federated Learning, it is crucial to handle low-quality, corrupted, or malicious data. However, traditional data valuation methods are not suitable due to privacy concerns. To address this, we propose a simple yet effective approach that utilizes a new influence approximation called "lazy influence" to filter and score data while preserving privacy. To do this, each participant uses their own data to estimate the influence of another participant's batch and sends a differentially private obfuscated score to the central coordinator. Our method has been shown to successfully filter out biased and corrupted data in various simulated and real-world settings, achieving a recall rate of over $>90\%$ (sometimes up to $100\%$) while maintaining strong differential privacy guarantees with $\varepsilon \leq 1$.
翻译:在联邦学习中,处理低质量、损坏或恶意数据至关重要。然而,由于隐私问题,传统的数据估值方法并不适用。为解决这一问题,我们提出了一种简单而有效的方法,利用一种称为"惰性影响"的新影响近似来筛选和评分数据,同时保护隐私。为此,每个参与者使用自己的数据来估计另一参与者批次的影响,并将经过差分隐私混淆处理的评分发送给中央协调器。我们的方法在各种模拟和实际场景中均能成功过滤有偏见和损坏的数据,召回率超过$>90\%$(有时高达$100\%$),同时保持$\varepsilon \leq 1$的强差分隐私保证。