Practitioners frequently aim to infer an unobserved population trajectory using sample snapshots at multiple time points. For instance, in single-cell sequencing, scientists would like to learn how gene expression evolves over time. But sequencing any cell destroys that cell. So we cannot access any cell's full trajectory, but we can access snapshot samples from many cells. Stochastic differential equations are commonly used to analyze systems with full individual-trajectory access; since here we have only sample snapshots, these methods are inapplicable. The deep learning community has recently explored using Schr\"odinger bridges (SBs) and their extensions to estimate these dynamics. However, these methods either (1) interpolate between just two time points or (2) require a single fixed reference dynamic within the SB, which is often just set to be Brownian motion. But learning piecewise from adjacent time points can fail to capture long-term dependencies. And practitioners are typically able to specify a model class for the reference dynamic but not the exact values of the parameters within it. So we propose a new method that (1) learns the unobserved trajectories from sample snapshots across multiple time points and (2) requires specification only of a class of reference dynamics, not a single fixed one. In particular, we suggest an iterative projection method inspired by Schr\"odinger bridges; we alternate between learning a piecewise SB on the unobserved trajectories and using the learned SB to refine our best guess for the dynamics within the reference class. We demonstrate the advantages of our method via a well-known simulated parametric model from ecology, simulated and real data from systems biology, and real motion-capture data.
翻译:实践中,研究者常需利用多个时间点的样本快照推断未观测的总体轨迹。例如在单细胞测序中,科学家希望了解基因表达随时间如何演化。但由于测序会破坏细胞,我们无法获取任一细胞的完整轨迹,仅能获取来自大量细胞的快照样本。随机微分方程通常用于分析具备完整个体轨迹数据的系统;而此处我们仅有样本快照,这些方法不再适用。深度学习领域近期开始探索利用薛定谔桥及其扩展形式来估计此类动态。然而,现有方法要么(1)仅能对两个时间点进行插值,要么(2)要求在薛定谔桥中使用单一固定的参考动态(通常仅设为布朗运动)。但基于相邻时间点的分段学习可能无法捕捉长期依赖关系。此外,实践者通常能够指定参考动态的模型类别,却难以确定其内部参数的具体取值。为此,我们提出一种新方法:(1)能够从多个时间点的样本快照中学习未观测轨迹;(2)仅需指定参考动态的类别,而非单一固定动态。具体而言,我们提出一种受薛定谔桥启发的迭代投影方法:交替进行未观测轨迹上的分段薛定谔桥学习,并利用学习结果优化参考类别内的动态估计。我们通过生态学中经典的参数化模拟模型、系统生物学的模拟与真实数据,以及实际运动捕捉数据,验证了本方法的优势。