Federated Learning (FL) is a popular algorithm to train machine learning models on user data constrained to edge devices (for example, mobile phones) due to privacy concerns. Typically, FL is trained with the assumption that no part of the user data can be egressed from the edge. However, in many production settings, specific data-modalities/meta-data are limited to be on device while others are not. For example, in commercial SLU systems, it is typically desired to prevent transmission of biometric signals (such as audio recordings of the input prompt) to the cloud, but egress of locally (i.e. on the edge device) transcribed text to the cloud may be possible. In this work, we propose a new algorithm called Partial Federated Learning (PartialFL), where a machine learning model is trained using data where a subset of data modalities or their intermediate representations can be made available to the server. We further restrict our model training by preventing the egress of data labels to the cloud for better privacy, and instead use a contrastive learning based model objective. We evaluate our approach on two different multi-modal datasets and show promising results with our proposed approach.
翻译:联邦学习(FL)是一种流行的算法,用于在受限于边缘设备(例如手机)的用户数据上训练机器学习模型,以应对隐私问题。通常,联邦学习的训练前提是用户数据的任何部分都不能从边缘设备传出。然而,在许多生产环境中,特定的数据模态/元数据仅限于设备端,而其他数据则不受此限制。例如,在商业SLU系统中,通常需要防止生物信号(如输入提示的音频记录)传输到云端,但允许本地(即在边缘设备上)转录的文本传输到云端。在这项工作中,我们提出了一种新算法,称为部分联邦学习(PartialFL),其中机器学习模型利用部分数据模态或其中间表示可提供给服务器的数据进行训练。为了增强隐私保护,我们进一步限制模型训练,防止数据标签传输到云端,而是采用基于对比学习的模型目标函数。我们在两个不同的多模态数据集上评估了该方法,并展示了其良好的性能结果。