Identifying informative components in binary data is an essential task in many research areas, including life sciences, social sciences, and recommendation systems. Boolean matrix factorization (BMF) is a family of methods that performs this task by efficiently factorizing the data. In real-world settings, the data is often distributed across stakeholders and required to stay private, prohibiting the straightforward application of BMF. To adapt BMF to this context, we approach the problem from a federated-learning perspective, while building on a state-of-the-art continuous binary matrix factorization relaxation to BMF that enables efficient gradient-based optimization. We propose to only share the relaxed component matrices, which are aggregated centrally using a proximal operator that regularizes for binary outcomes. We show the convergence of our federated proximal gradient descent algorithm and provide differential privacy guarantees. Our extensive empirical evaluation demonstrates that our algorithm outperforms, in terms of quality and efficacy, federation schemes of state-of-the-art BMF methods on a diverse set of real-world and synthetic data.
翻译:在生命科学、社会科学和推荐系统等众多研究领域中,识别二元数据中的信息成分是一项基本任务。布尔矩阵分解(BMF)是一类通过高效分解数据来完成此任务的方法。在现实场景中,数据通常分散在各利益相关方之间,且需要保持私密性,这阻碍了BMF的直接应用。为使BMF适应此场景,我们从联邦学习的角度处理该问题,同时基于一种先进的、可实现高效基于梯度优化的连续二元矩阵分解松弛方法。我们提出仅共享松弛后的成分矩阵,并通过一个对二元结果进行正则化的近端算子进行中心化聚合。我们证明了所提出的联邦近端梯度下降算法的收敛性,并提供了差分隐私保证。我们广泛的实证评估表明,在多样化的真实世界和合成数据集上,我们的算法在质量和效率方面均优于现有先进BMF方法的联邦化方案。