A large number of federated learning (FL) algorithms have been proposed for different applications and from varying perspectives. However, the evaluation of such approaches often relies on a single metric (e.g., accuracy). Such a practice fails to account for the unique demands and diverse requirements of different use cases. Thus, how to comprehensively evaluate an FL algorithm and determine the most suitable candidate for a designated use case remains an open question. To mitigate this research gap, we introduce the Holistic Evaluation Metrics (HEM) for FL in this work. Specifically, we collectively focus on three primary use cases, which are Internet of Things (IoT), smart devices, and institutions. The evaluation metric encompasses various aspects including accuracy, convergence, computational efficiency, fairness, and personalization. We then assign a respective importance vector for each use case, reflecting their distinct performance requirements and priorities. The HEM index is finally generated by integrating these metric components with their respective importance vectors. Through evaluating different FL algorithms in these three prevalent use cases, our experimental results demonstrate that HEM can effectively assess and identify the FL algorithms best suited to particular scenarios. We anticipate this work sheds light on the evaluation process for pragmatic FL algorithms in real-world applications.
翻译:大量联邦学习(FL)算法已针对不同应用和从不同视角被提出。然而,对此类方法的评估往往依赖于单一指标(例如准确率)。这种实践未能考虑不同用例的独特需求和多样化要求。因此,如何全面评估联邦学习算法并确定特定场景下最合适的候选方案仍是一个未解难题。为弥合这一研究空白,本文针对联邦学习提出了整体评估指标(HEM)。具体而言,我们共同聚焦于三大主要用例:物联网(IoT)、智能设备及机构。该评估指标涵盖准确率、收敛性、计算效率、公平性和个性化等多个维度。随后,我们为每个用例分配相应的重要性向量,以反映其不同的性能需求和优先级。通过将这些指标分量与其对应的重要性向量进行集成,最终生成HEM指数。通过在这三个典型用例中评估不同联邦学习算法,我们的实验结果表明,HEM能够有效评估并识别出最适配特定场景的联邦学习算法。我们期望此项工作能够为实际应用中实用型联邦学习算法的评估流程提供启示。