Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation.
翻译:评估是衡量系统实现预期目标程度的系统性方法。联邦学习是一种面向隐私保护的机器学习新范式,允许多方在不共享敏感数据的情况下协作训练模型。然而,由于联邦学习具有跨学科特性及多样性目标(如效用性、效率性和安全性),对其进行评估颇具挑战性。本综述首先梳理现有研究中采用的主要评估目标,进而探讨各目标对应的评估指标。我们同时介绍FedEval这一开源平台,该平台为联邦学习算法在效用性、效率性和安全性方面提供标准化、综合性的评估框架。最后,我们讨论联邦学习评估面临的若干挑战及未来研究方向。