Federated Learning (FL) is a decentralized machine learning approach that has gained attention for its potential to enable collaborative model training across clients while protecting data privacy, making it an attractive solution for the chemical industry. This work aims to provide the chemical engineering community with an accessible introduction to the discipline. Supported by a hands-on tutorial and a comprehensive collection of examples, it explores the application of FL in tasks such as manufacturing optimization, multimodal data integration, and drug discovery while addressing the unique challenges of protecting proprietary information and managing distributed datasets. The tutorial was built using key frameworks such as $\texttt{Flower}$ and $\texttt{TensorFlow Federated}$ and was designed to provide chemical engineers with the right tools to adopt FL in their specific needs. We compare the performance of FL against centralized learning across three different datasets relevant to chemical engineering applications, demonstrating that FL will often maintain or improve classification performance, particularly for complex and heterogeneous data. We conclude with an outlook on the open challenges in federated learning to be tackled and current approaches designed to remediate and improve this framework.
翻译:联邦学习(FL)是一种去中心化的机器学习方法,因其能够在保护数据隐私的同时实现跨客户端的协同模型训练而受到关注,这使其成为化学工业领域颇具吸引力的解决方案。本文旨在为化学工程学界提供一份易于理解的学科入门指南。通过实践教程和全面的示例集合,本文探讨了联邦学习在制造优化、多模态数据整合和药物发现等任务中的应用,同时解决了保护专有信息和管理分布式数据集所特有的挑战。本教程基于关键框架如 $\texttt{Flower}$ 和 $\texttt{TensorFlow Federated}$ 构建,旨在为化学工程师提供适合其特定需求采用联邦学习的工具。我们在三个与化学工程应用相关的不同数据集上比较了联邦学习与集中式学习的性能,结果表明联邦学习通常能够保持或提升分类性能,尤其对于复杂和异构的数据。最后,我们对联邦学习中亟待解决的开放挑战以及当前旨在改进和完善该框架的方法进行了展望。