The ubiquity of edge devices has led to a growing amount of unlabeled data produced at the edge. Deep learning models deployed on edge devices are required to learn from these unlabeled data to continuously improve accuracy. Self-supervised representation learning has achieved promising performances using centralized unlabeled data. However, the increasing awareness of privacy protection limits centralizing the distributed unlabeled image data on edge devices. While federated learning has been widely adopted to enable distributed machine learning with privacy preservation, without a data selection method to efficiently select streaming data, the traditional federated learning framework fails to handle these huge amounts of decentralized unlabeled data with limited storage resources on edge. To address these challenges, we propose a Self-supervised On-device Federated learning framework with coreset selection, which we call SOFed, to automatically select a coreset that consists of the most representative samples into the replay buffer on each device. It preserves data privacy as each client does not share raw data while learning good visual representations. Experiments demonstrate the effectiveness and significance of the proposed method in visual representation learning.
翻译:边缘设备的普及导致在边缘端产生了大量未标注数据。部署在边缘设备上的深度学习模型需要从这些未标注数据中学习,以持续提升准确性。自监督表示学习利用集中式未标注数据已取得显著性能。然而,日益增强的隐私保护意识限制了将边缘设备上分布式未标注图像数据进行集中化处理。尽管联邦学习已被广泛采用以实现隐私保护的分布式机器学习,但缺乏高效选择流数据的数据选择方法,导致传统联邦学习框架无法在边缘设备有限的存储资源下处理海量分散的未标注数据。为解决这些挑战,我们提出了一种基于核心集选择的自监督设备端联邦学习框架SOFed,该框架能自动选择最具代表性样本构成核心集,并存入各设备的回放缓冲区。该方法在保持数据隐私(各客户端不共享原始数据)的同时,学习到优质的视觉表征。实验证明了该方法在视觉表示学习中的有效性和重要性。