Semi-supervised Learning of Pushforwards For Domain Translation & Adaptation

Given two probability densities on related data spaces, we seek a map pushing one density to the other while satisfying application-dependent constraints. For maps to have utility in a broad application space (including domain translation, domain adaptation, and generative modeling), the map must be available to apply on out-of-sample data points and should correspond to a probabilistic model over the two spaces. Unfortunately, existing approaches, which are primarily based on optimal transport, do not address these needs. In this paper, we introduce a novel pushforward map learning algorithm that utilizes normalizing flows to parameterize the map. We first re-formulate the classical optimal transport problem to be map-focused and propose a learning algorithm to select from all possible maps under the constraint that the map minimizes a probability distance and application-specific regularizers; thus, our method can be seen as solving a modified optimal transport problem. Once the map is learned, it can be used to map samples from a source domain to a target domain. In addition, because the map is parameterized as a composition of normalizing flows, it models the empirical distributions over the two data spaces and allows both sampling and likelihood evaluation for both data sets. We compare our method (parOT) to related optimal transport approaches in the context of domain adaptation and domain translation on benchmark data sets. Finally, to illustrate the impact of our work on applied problems, we apply parOT to a real scientific application: spectral calibration for high-dimensional measurements from two vastly different environments

翻译：给定两个相关数据空间上的概率密度，我们寻求一个在满足应用相关约束的同时将一个密度推前映射到另一个密度的映射。为使映射在广泛的应用场景（包括域翻译、域自适应和生成建模）中具有实用性，该映射必须能够应用于样本外数据点，并且应对应于两个空间上的概率模型。遗憾的是，现有主要基于最优传输的方法无法满足这些需求。在本文中，我们引入了一种新颖的推前映射学习算法，该算法利用归一化流对映射进行参数化。我们首先将经典最优传输问题重新表述为以映射为中心的形式，并提出一个学习算法，在映射满足最小化概率距离和应用特定正则化项的约束下，从所有可能的映射中进行选择；因此，我们的方法可视为求解一个修正的最优传输问题。学习到映射后，可将其用于将源域样本映射到目标域。此外，由于映射被参数化为归一化流的复合形式，它建模了两个数据空间上的经验分布，并允许对两个数据集进行采样和似然评估。我们以域自适应和域翻译为背景，在基准数据集上比较了我们的方法（parOT）与相关最优传输方法的性能。最后，为说明我们工作对实际应用问题的影响，我们将parOT应用于一个真实的科学场景：两个截然不同环境下的高维测量数据的光谱校准。