We investigate contrastive learning in the federated setting through the lens of SimCLR and multi-view mutual information maximization. In doing so, we uncover a connection between contrastive representation learning and user verification; by adding a user verification loss to each client's local SimCLR loss we recover a lower bound to the global multi-view mutual information. To accommodate for the case of when some labelled data are available at the clients, we extend our SimCLR variant to the federated semi-supervised setting. We see that a supervised SimCLR objective can be obtained with two changes: a) the contrastive loss is computed between datapoints that share the same label and b) we require an additional auxiliary head that predicts the correct labels from either of the two views. Along with the proposed SimCLR extensions, we also study how different sources of non-i.i.d.-ness can impact the performance of federated unsupervised learning through global mutual information maximization; we find that a global objective is beneficial for some sources of non-i.i.d.-ness but can be detrimental for others. We empirically evaluate our proposed extensions in various tasks to validate our claims and furthermore demonstrate that our proposed modifications generalize to other pretraining methods.
翻译:我们从SimCLR和多视图互信息最大化的角度研究联邦场景下的对比学习。在此过程中,我们发现对比表示学习与用户验证之间存在联系:通过在每个客户端的局部SimCLR损失中添加用户验证损失,我们恢复了全局多视图互信息的下界。为适应客户端存在部分标注数据的情况,我们将SimCLR变体扩展到联邦半监督场景。我们发现,通过两个改动即可获得有监督SimCLR目标:a)对比损失在共享相同标签的数据点之间计算,b)需要额外的辅助头从两个视图中预测正确标签。除提出的SimCLR扩展外,我们还研究了不同非独立同分布源如何通过全局互信息最大化影响联邦无监督学习的性能;我们发现全局目标对某些非独立同分布源有益,但对其他源可能产生负面影响。我们通过多种任务对所提出的扩展进行实证评估以验证我们的主张,并进一步证明我们提出的改进方法可推广至其他预训练方法。