Optimal transport (OT) theory focuses, among all maps $T:\mathbb{R}^d\rightarrow \mathbb{R}^d$ that can morph a probability measure onto another, on those that are the ``thriftiest'', i.e. such that the averaged cost $c(x, T(x))$ between $x$ and its image $T(x)$ be as small as possible. Many computational approaches have been proposed to estimate such Monge maps when $c$ is the $\ell_2^2$ distance, e.g., using entropic maps [Pooladian'22], or neural networks [Makkuva'20, Korotin'20]. We propose a new model for transport maps, built on a family of translation invariant costs $c(x, y):=h(x-y)$, where $h:=\tfrac{1}{2}\|\cdot\|_2^2+\tau$ and $\tau$ is a regularizer. We propose a generalization of the entropic map suitable for $h$, and highlight a surprising link tying it with the Bregman centroids of the divergence $D_h$ generated by $h$, and the proximal operator of $\tau$. We show that choosing a sparsity-inducing norm for $\tau$ results in maps that apply Occam's razor to transport, in the sense that the displacement vectors $\Delta(x):= T(x)-x$ they induce are sparse, with a sparsity pattern that varies depending on $x$. We showcase the ability of our method to estimate meaningful OT maps for high-dimensional single-cell transcription data, in the $34000$-$d$ space of gene counts for cells, without using dimensionality reduction, thus retaining the ability to interpret all displacements at the gene level.
翻译:最优传输(OT)理论聚焦于所有将概率测度变换为另一测度的映射$T:\mathbb{R}^d\rightarrow \mathbb{R}^d$中,寻找“最经济”的映射,即使得$x$与其像$T(x)$之间的平均代价$c(x, T(x))$尽可能小。针对$c$为$\ell_2^2$距离的情形(如利用熵映射[Pooladian'22]或神经网络[Makkuva'20, Korotin'20]),已有诸多计算方法用于估计此类Monge映射。本文提出一种新的传输映射模型,基于一族平移不变代价函数$c(x, y):=h(x-y)$,其中$h:=\tfrac{1}{2}\|\cdot\|_2^2+\tau$,$\tau$为正则化项。我们给出了适用于$h$的熵映射推广形式,并揭示了其与$h$产生的散度$D_h$的Bregman质心以及$\tau$的近端算子之间的惊人联系。研究表明,若选择诱导稀疏性的范数作为$\tau$,所得映射会对传输过程施以Occam剃刀原则,即其诱导的位移向量$\Delta(x):= T(x)-x$具有稀疏性,且稀疏模式随$x$变化。我们展示了该方法在无需降维的情况下,对高维单细胞转录组数据(细胞基因计数空间维度达$34000$维)估计有意义OT映射的能力,从而保留在基因层面解释所有位移的可解释性。