Machine learning is typically framed from a perspective of i.i.d., and more importantly, isolated data. In parts, federated learning lifts this assumption, as it sets out to solve the real-world challenge of collaboratively learning a shared model from data distributed across clients. However, motivated primarily by privacy and computational constraints, the fact that data may change, distributions drift, or even tasks advance individually on clients, is seldom taken into account. The field of continual learning addresses this separate challenge and first steps have recently been taken to leverage synergies in distributed supervised settings, in which several clients learn to solve changing classification tasks over time without forgetting previously seen ones. Motivated by these prior works, we posit that such federated continual learning should be grounded in unsupervised learning of representations that are shared across clients; in the loose spirit of how humans can indirectly leverage others' experience without exposure to a specific task. For this purpose, we demonstrate that masked autoencoders for distribution estimation are particularly amenable to this setup. Specifically, their masking strategy can be seamlessly integrated with task attention mechanisms to enable selective knowledge transfer between clients. We empirically corroborate the latter statement through several continual federated scenarios on both image and binary datasets.
翻译:机器学习通常建立在独立同分布且更重要的是孤立数据的假设之上。联邦学习在一定程度上放宽了这一假设,因为它旨在解决从分布在多个客户端的数据中协作学习共享模型这一现实挑战。然而,主要出于隐私和计算限制的考虑,数据可能变化、分布可能漂移、甚至客户端上的任务可能各自演进这一事实很少被纳入考量。持续学习领域应对着这一独立的挑战,并且近期已初步尝试在分布式监督设置中利用协同效应,其中多个客户端学习随时间推移解决不断变化的分类任务,同时不遗忘先前见过的任务。受这些先前工作的启发,我们认为这种联邦持续学习应建立在跨客户端共享表征的无监督学习基础之上;这大致类似于人类如何能够间接利用他人的经验,而无需接触特定任务。为此,我们证明了用于分布估计的掩码自编码器特别适合这种设置。具体而言,其掩码策略可以与任务注意力机制无缝集成,从而实现客户端之间有选择的知识迁移。我们通过在图像和二进制数据集上的多个持续联邦场景,从经验上证实了后一论断。