Federated Learning is a distributed machine learning approach that enables geographically distributed data silos to collaboratively learn a joint machine learning model without sharing data. Most of the existing work operates on unstructured data, such as images or text, or on structured data assumed to be consistent across the different sites. However, sites often have different schemata, data formats, data values, and access patterns. The field of data integration has developed many methods to address these challenges, including techniques for data exchange and query rewriting using declarative schema mappings, and for entity linkage. Therefore, we propose an architectural vision for an end-to-end Federated Learning and Integration system, incorporating the critical steps of data harmonization and data imputation, to spur further research on the intersection of data management information systems and machine learning.
翻译:联邦学习是一种分布式机器学习方法,使地理上分散的数据孤岛能够在不共享数据的情况下协同学习联合机器学习模型。现有的大部分工作处理非结构化数据(如图像或文本),或假设不同站点间一致的结构化数据。然而,各站点通常具有不同的模式、数据格式、数据值和访问模式。数据集成领域已开发出多种方法来解决这些挑战,包括使用声明式模式映射进行数据交换和查询重写的技术,以及实体链接技术。因此,我们提出了一种端到端的联邦学习与集成系统的架构愿景,整合了数据协调和数据插补的关键步骤,以推动数据管理信息系统与机器学习交叉领域的进一步研究。