Federated Learning by nature is susceptible to low-quality, corrupted, or even malicious data that can severely degrade the quality of the learned model. Traditional techniques for data valuation cannot be applied as the data is never revealed. We present a novel technique for filtering, and scoring data based on a practical influence approximation (`lazy' influence) that can be implemented in a privacy-preserving manner. Each participant uses his own data to evaluate the influence of another participant's batch, and reports to the center an obfuscated score using differential privacy. Our technique allows for highly effective filtering of corrupted data in a variety of applications. Importantly, we show that most of the corrupted data can be filtered out (recall of $>90\%$, and even up to $100\%$), even under really strong privacy guarantees ($\varepsilon \leq 1$).
翻译:联邦学习本质上容易受到低质量、被破坏甚至恶意数据的影响,这些数据会严重降低学习模型的质量。由于数据从未被泄露,传统的数据价值评估技术无法应用。我们提出了一种基于实用影响近似(“懒惰”影响)的新型数据过滤与评分技术,该技术可以以隐私保护的方式实现。每个参与者利用自身数据评估另一参与者批次数据的影响,并向中心报告经差分隐私处理后的混淆评分。我们的技术能够在多种应用中高效过滤被破坏的数据。重要的是,我们证明即使在极强的隐私保护条件下(ε≤1),也能过滤掉大部分被破坏的数据(召回率超过90%,甚至可达100%)。