Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation.
翻译:评估是一种系统化的方法,用于衡量系统实现预期目标的程度。联邦学习作为一种保护隐私的机器学习新范式,允许多方在不共享敏感数据的情况下协作训练模型。然而,由于联邦学习具有跨学科性质且涉及多样化目标(如效用、效率与安全性),其评估颇具挑战性。在本综述中,我们首先梳理现有研究采用的主要评估目标,继而探究各目标对应的评估指标。此外,我们介绍FedEval这一开源平台,该平台为联邦学习算法提供了涵盖效用、效率与安全性的标准化综合评估框架。最后,本文讨论了联邦学习评估面临的若干挑战与未来研究方向。