Learning measure-to-measure mappings is a crucial task in machine learning, featured prominently in generative modeling. Recent years have witnessed a surge of techniques that draw inspiration from optimal transport (OT) theory. Combined with neural network models, these methods collectively known as \textit{Neural OT} use optimal transport as an inductive bias: such mappings should be optimal w.r.t. a given cost function, in the sense that they are able to move points in a thrifty way, within (by minimizing displacements) or across spaces (by being isometric). This principle, while intuitive, is often confronted with several practical challenges that require adapting the OT toolbox: cost functions other than the squared-Euclidean cost can be challenging to handle, the deterministic formulation of Monge maps leaves little flexibility, mapping across incomparable spaces raises multiple challenges, while the mass conservation constraint inherent to OT can provide too much credit to outliers. While each of these mismatches between practice and theory has been addressed independently in various works, we propose in this work an elegant framework to unify them, called \textit{generative entropic neural optimal transport} (GENOT). GENOT can accommodate any cost function; handles randomness using conditional generative models; can map points across incomparable spaces, and can be used as an \textit{unbalanced} solver. We evaluate our approach through experiments conducted on various synthetic datasets and demonstrate its practicality in single-cell biology. In this domain, GENOT proves to be valuable for tasks such as modeling cell development, predicting cellular responses to drugs, and translating between different data modalities of cells.
翻译:学习测度到测度的映射是机器学习中的关键任务,尤其在生成式建模中占据重要地位。近年来,受最优传输理论启发的技术层出不穷。结合神经网络模型,这些统称为“神经最优传输”的方法将最优传输作为归纳偏置:映射应针对给定代价函数最优,即能够以节俭方式移动点——在空间内通过最小化位移,或跨空间通过保持等距性。这一原则虽直观,却常面临若干实际挑战,需调整最优传输工具:除平方欧氏代价外的代价函数难以处理;蒙日映射的确定性表述缺乏灵活性;跨不可比空间的映射带来多重挑战;而最优传输固有的质量守恒约束可能对异常值赋予过多权重。尽管以往研究独立解决了这些实践与理论间的脱节问题,本文提出一个统一框架——生成熵神经最优传输。该框架可适配任意代价函数;通过条件生成模型处理随机性;支持跨不可比空间的点映射,并可作为非平衡求解器使用。我们在多种合成数据集上开展实验评估,并证明其在单细胞生物学中的实用性。在该领域,GENOT在细胞发育建模、药物细胞响应预测及跨细胞数据模态翻译等任务中展现出重要价值。