We present a new approach based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the dataset from the Large-Scale Multilingual Machine Translation Shared Task (Small Track #2) and the subset of Sami languages from the multilingual benchmark for Finno-Ugric languages. In addition to its effectiveness, MeritFed is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments at https://github.com/VityaVitalich/MeritFed
翻译:我们提出了一种基于个性化联邦学习算法MeritFed的新方法,该方法可应用于具有异构数据的自然语言处理任务。我们在低资源机器翻译任务上对其进行了评估,使用了大规模多语言机器翻译共享任务(小型赛道#2)的数据集以及芬兰-乌戈尔语系多语言基准测试中的萨米语言子集。除了有效性之外,MeritFed还具有高度可解释性,因为它可用于追踪每种训练语言的影响。我们的分析表明:目标数据集规模会影响辅助语言间的权重分配;不相关语言不会对训练产生干扰;辅助优化器参数的影响微乎其微。本方法仅需少量代码即可轻松应用,我们在https://github.com/VityaVitalich/MeritFed 提供了实验复现脚本。