Increasing legislation and regulations on private and proprietary information results in scattered data sources also known as the "data islands". Although Federated Learning-based paradigms can enable privacy-preserving collaboration over decentralized data, they have inherent deficiencies in fairness, costs and reproducibility because of being learning-centric, which greatly limits the way how participants cooperate with each other. In light of this, we investigate the possibilities to shift from resource-intensive learning to task-agnostic collaboration especially when the participants have no interest in a common goal. We term this new scenario as Task-Agnostic Federation (TAF), and investigate several branches of research that serve as the technical building blocks. These techniques directly or indirectly embrace data-centric approaches that can operate independently of any learning task. In this article, we first describe the system architecture and problem setting for TAF. Then, we present a three-way roadmap and categorize recent studies in three directions: collaborative data expansion, collaborative data refinement, and collective data harmonization in the federation. Further, we highlight several challenges and open questions that deserve more attention from the community. With our investigation, we intend to offer new insights about how autonomic parties with varied motivation can cooperate over decentralized data beyond learning.
翻译:随着对私有和专有信息的立法和监管日益加强,数据源呈现碎片化分布,形成所谓的"数据孤岛"。尽管基于联邦学习的范式能够在去中心化数据上实现隐私保护协作,但由于其以学习为中心的特性,在公平性、成本和可复现性方面存在固有缺陷,这极大限制了参与者之间的协作方式。鉴于此,我们探索从资源密集型学习转向任务无关协作的可能性,特别是在参与者缺乏共同目标的情况下。我们将这种新场景定义为任务无关联邦学习,并研究了作为技术构建模块的若干研究方向。这些技术直接或间接采用了数据为中心的方法,能够独立于任何学习任务运行。本文首先阐述了TAF的系统架构与问题设定,随后提出三向发展路径,将近期研究归纳为三个方向:联邦中的协作数据扩展、协作数据精化以及集体数据协调。进一步,我们重点指出了若干值得学界更多关注的挑战与开放性问题。通过本次研究,我们旨在为具有不同动机的自治方如何在超越学习的范畴内实现去中心化数据协作提供新的见解。