Optimal transport (OT) theory has reshaped the field of generative modeling: Combined with neural networks, recent \textit{Neural OT} (N-OT) solvers use OT as an inductive bias, to focus on ``thrifty'' mappings that minimize average displacement costs. This core principle has fueled the successful application of N-OT solvers to high-stakes scientific challenges, notably single-cell genomics. N-OT solvers are, however, increasingly confronted with practical challenges: while most N-OT solvers can handle squared-Euclidean costs, they must be repurposed to handle more general costs; their reliance on deterministic Monge maps as well as mass conservation constraints can easily go awry in the presence of outliers; mapping points \textit{across} heterogeneous spaces is out of their reach. While each of these challenges has been explored independently, we propose a new framework that can handle, natively, all of these needs. The \textit{generative entropic neural OT} (GENOT) framework models the conditional distribution $\pi_\varepsilon(\*y|\*x)$ of an optimal \textit{entropic} coupling $\pi_\varepsilon$, using conditional flow matching. GENOT is generative, and can transport points \textit{across} spaces, guided by sample-based, unbalanced solutions to the Gromov-Wasserstein problem, that can use any cost. We showcase our approach on both synthetic and single-cell datasets, using GENOT to model cell development, predict cellular responses, and translate between data modalities.
翻译:最优传输(OT)理论重塑了生成式建模领域:与神经网络结合,近年来的神经OT(N-OT)求解器将OT作为归纳偏置,聚焦于最小化平均位移成本的“节俭”映射。这一核心原则推动了N-OT求解器在高风险科学难题中的成功应用,尤其是在单细胞基因组学领域。然而,N-OT求解器日益面临实际挑战:尽管多数N-OT求解器能处理平方欧氏成本,但需改造以应对更一般的成本;其对确定性Monge映射及质量守恒约束的依赖在存在异常值时易失效;跨异质空间映射点对超出其能力范畴。尽管上述挑战已被独立探索,我们提出一个能原生处理所有这些需求的新框架。生成式熵神经OT(GENOT)框架利用条件流匹配,对最优熵耦合πₑ的条件分布πₑ(y|x)进行建模。GENOT具有生成性,能够引导基于样本的格罗莫夫-瓦瑟斯坦问题非平衡解,跨空间传输点对,且支持任意成本。我们在合成数据集和单细胞数据集上展示了该方法,利用GENOT模拟细胞发育、预测细胞响应并实现数据模态间的翻译。