It is challenging to implement Kernel methods, if the data sources are distributed and cannot be joined at a trusted third party for privacy reasons. It is even more challenging, if the use case rules out privacy-preserving approaches that introduce noise. An example for such a use case is machine learning on clinical data. To realize exact privacy preserving computation of kernel methods, we propose FLAKE, a Federated Learning Approach for KErnel methods on horizontally distributed data. With FLAKE, the data sources mask their data so that a centralized instance can compute a Gram matrix without compromising privacy. The Gram matrix allows to calculate many kernel matrices, which can be used to train kernel-based machine learning algorithms such as Support Vector Machines. We prove that FLAKE prevents an adversary from learning the input data or the number of input features under a semi-honest threat model. Experiments on clinical and synthetic data confirm that FLAKE is outperforming the accuracy and efficiency of comparable methods. The time needed to mask the data and to compute the Gram matrix is several orders of magnitude less than the time a Support Vector Machine needs to be trained. Thus, FLAKE can be applied to many use cases.
翻译:实现核方法面临挑战,若数据源分布式存储且出于隐私原因无法在受信任的第三方处合并。若应用场景排除引入噪声的隐私保护方法,则更具挑战性,例如临床数据上的机器学习。为精确实现核方法的隐私保护计算,我们提出FLAKE——一种面向水平分布数据的核方法联邦学习方法。通过FLAKE,数据源对其数据实施掩码操作,使得中心化实例能够在不泄露隐私的前提下计算格拉姆矩阵。该格拉姆矩阵可计算多种核矩阵,进而训练基于核的机器学习算法(如支持向量机)。我们证明FLAKE在半诚实威胁模型下能防止攻击者获知输入数据或输入特征数量。在临床数据和合成数据上的实验表明,FLAKE在准确性和效率上均优于同类方法。数据掩码与格拉姆矩阵计算所需时间比支持向量机训练时间低数个数量级,因此FLAKE可适用于多种应用场景。