We devise a theoretical framework and a numerical method to infer trajectories of a stochastic process from samples of its temporal marginals. This problem arises in the analysis of single cell RNA-sequencing data, which provide high dimensional measurements of cell states but cannot track the trajectories of the cells over time. We prove that for a class of stochastic processes it is possible to recover the ground truth trajectories from limited samples of the temporal marginals at each time-point, and provide an efficient algorithm to do so in practice. The method we develop, Global Waddington-OT (gWOT), boils down to a smooth convex optimization problem posed globally over all time-points involving entropy-regularized optimal transport. We demonstrate that this problem can be solved efficiently in practice and yields good reconstructions, as we show on several synthetic and real datasets.
翻译:我们构建了一个理论框架与数值方法,用于从随机过程时间边际分布的样本中推断其轨迹。该问题源于单细胞RNA测序数据分析——这类数据虽然能提供细胞状态的高维测量,却无法追踪细胞随时间演变的轨迹。我们证明,对于某一类随机过程,仅通过各时间点上时间边际分布的有限样本即可恢复真实轨迹,并为此提供了实用高效算法。所提出的方法——全局Waddington最优传输(gWOT)——可归结为在所有时间点上全局定义的、涉及熵正则化最优传输的光滑凸优化问题。我们通过合成数据集与真实数据集验证表明,该问题在实践中可高效求解,并能获得良好的重建结果。