Dataset Dictionary Learning in a Wasserstein Space for Federated Domain Adaptation

Multi-Source Domain Adaptation (MSDA) is a challenging scenario where multiple related and heterogeneous source datasets must be adapted to an unlabeled target dataset. Conventional MSDA methods often overlook that data holders may have privacy concerns, hindering direct data sharing. In response, decentralized MSDA has emerged as a promising strategy to achieve adaptation without centralizing clients' data. Our work proposes a novel approach, Decentralized Dataset Dictionary Learning, to address this challenge. Our method leverages Wasserstein barycenters to model the distributional shift across multiple clients, enabling effective adaptation while preserving data privacy. Specifically, our algorithm expresses each client's underlying distribution as a Wasserstein barycenter of public atoms, weighted by private barycentric coordinates. Our approach ensures that the barycentric coordinates remain undisclosed throughout the adaptation process. Extensive experimentation across five visual domain adaptation benchmarks demonstrates the superiority of our strategy over existing decentralized MSDA techniques. Moreover, our method exhibits enhanced robustness to client parallelism while maintaining relative resilience compared to conventional decentralized MSDA methodologies.

翻译：多源域适应（MSDA）是一个具有挑战性的场景，其中多个相关且异构的源数据集需要适应一个未标记的目标数据集。传统的MSDA方法常常忽视数据持有者可能存在的隐私顾虑，从而阻碍了直接的数据共享。为此，去中心化的MSDA已成为一种在不集中客户端数据的情况下实现适应的有前景的策略。我们的工作提出了一种新颖的方法——去中心化数据集字典学习——以应对这一挑战。我们的方法利用Wasserstein重心对多个客户端之间的分布偏移进行建模，从而在保护数据隐私的同时实现有效的适应。具体而言，我们的算法将每个客户端的底层分布表示为公共原子的Wasserstein重心，并通过私有的重心坐标进行加权。我们的方法确保了重心坐标在整个适应过程中保持不公开。在五个视觉域适应基准上的广泛实验证明了我们的策略优于现有的去中心化MSDA技术。此外，与传统的去中心化MSDA方法相比，我们的方法在保持相对弹性的同时，展现了对客户端并行性的增强鲁棒性。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日