Collaborative Machine Learning (CML) allows participants to jointly train a machine learning model while keeping their training data private. In scenarios where privacy is a strong requirement, such as health-related applications, safety is also a primary concern. This means that privacy-preserving CML processes must produce models that output correct and reliable decisions \emph{even in the presence of potentially untrusted participants}. In response to this issue, researchers propose to use \textit{robust aggregators} that rely on metrics which help filter out malicious contributions that could compromise the training process. In this work, we formalize the landscape of robust aggregators in the literature. Our formalization allows us to show that existing robust aggregators cannot fulfill their goal: either they use distance-based metrics that cannot accurately identify targeted malicious updates; or propose methods whose success is in direct conflict with the ability of CML participants to learn from others and therefore cannot eliminate the risk of manipulation without preventing learning.
翻译:协作机器学习(CML)允许参与方在保持训练数据私密的前提下共同训练机器学习模型。在隐私要求严苛的场景(如医疗健康相关应用)中,安全性同样是首要关注点。这意味着隐私保护的CML过程必须产出即使在存在不可信参与方的情况下也能输出正确可靠决策的模型。针对该问题,研究者提出采用依赖度量指标的鲁棒聚合器,这些指标有助于过滤可能危及训练过程的恶意贡献。本文对文献中鲁棒聚合器的研究现状进行了形式化梳理。通过形式化分析,我们发现现有鲁棒聚合器无法达成其目标:它们或采用无法准确识别定向恶意更新的距离度量,或提出的方法与CML参与方从他人学习的能力直接冲突,因而无法在不阻碍学习的前提下消除操纵风险。