Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures $(\mu, \nu)$, a parameterized map $T_\theta$ that can efficiently map $\mu$ onto $\nu$. In many applications, such as predicting cell responses to treatments, pairs of input/output data measures $(\mu, \nu)$ that define optimal transport problems do not arise in isolation but are associated with a context $c$, as for instance a treatment when comparing populations of untreated and treated cells. To account for that context in OT estimation, we introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable, using several pairs of measures $\left(\mu_i, \nu_i\right)$ tagged with a context label $c_i$. CondOT learns a global map $\mathcal{T}_\theta$ conditioned on context that is not only expected to fit all labeled pairs in the dataset $\left\{\left(c_i,\left(\mu_i, \nu_i\right)\right)\right\}$, i.e., $\mathcal{T}_\theta\left(c_i\right) \sharp \mu_i \approx \nu_i$, but should also generalize to produce meaningful maps $\mathcal{T}_\theta\left(c_{\text {new }}\right)$ when conditioned on unseen contexts $c_{\text {new }}$. Our approach harnesses and provides a novel usage for partially input convex neural networks, for which we introduce a robust and efficient initialization strategy inspired by Gaussian approximations. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells, using only observations of the effects of said perturbations separately.
翻译:最优传输(OT)理论描述了在众多可能的映射中定义和选择将概率测度映射到另一概率测度的最有效方式的一般原理。该理论主要用于给定源概率测度与目标概率测度对$(\mu, \nu)$时,估计一个能高效将$\mu$映射到$\nu$的参数化映射$T_\theta$。在许多应用场景中(如预测细胞对治疗的反应),定义最优传输问题的输入/输出数据测度对$(\mu, \nu)$并非孤立出现,而是与上下文$c$相关联(例如,比较未处理和经处理细胞群体时的治疗条件)。为在OT估计中考虑这一上下文,我们引入CondOT——一种多任务方法,通过使用带有上下文标签$c_i$的多个测度对$\left(\mu_i, \nu_i\right)$,估计条件于上下文变量的一系列OT映射。CondOT学习一个条件于上下文的全局映射$\mathcal{T}_\theta$,该映射不仅期望拟合数据集中所有带标签对$\left\{\left(c_i,\left(\mu_i, \nu_i\right)\right)\right\}$(即$\mathcal{T}_\theta\left(c_i\right) \sharp \mu_i \approx \nu_i$),还应具备泛化能力,在条件于未见过的上下文$c_{\text {new }}$时生成有意义的映射$\mathcal{T}_\theta\left(c_{\text {new }}\right)$。我们的方法利用并提供了部分输入凸神经网络的新型应用,为此我们引入了一种受高斯近似启发、鲁棒且高效的初始化策略。我们通过仅观察遗传或治疗扰动各自的效果,展示了CondOT推断任意组合对这些扰动对单细胞影响的能力。