We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables. The approach uses measure transport by randomized assignment flows on the statistical submanifold of factorizing distributions, which also enables to sample efficiently from the target distribution and to assess the likelihood of unseen data points. The embedding of the flow via the Segre map in the meta-simplex of all discrete joint distributions ensures that any target distribution can be represented in principle, whose complexity in practice only depends on the parametrization of the affinity function of the dynamical assignment flow system. Our model can be trained in a simulation-free manner without integration by conditional Riemannian flow matching, using the training data encoded as geodesics in closed-form with respect to the e-connection of information geometry. By projecting high-dimensional flow matching in the meta-simplex of joint distributions to the submanifold of factorizing distributions, our approach has strong motivation from first principles of modeling coupled discrete variables. Numerical experiments devoted to distributions of structured image labelings demonstrate the applicability to large-scale problems, which may include discrete distributions in other application areas. Performance measures show that our approach scales better with the increasing number of classes than recent related work.
翻译:我们提出了一种新颖的生成模型,用于表示可能包含大量离散随机变量的联合概率分布。该方法利用因子化分布统计子流形上的随机化分配流实现测度传输,从而能够高效地从目标分布中采样并评估未见数据点的似然性。通过塞格雷映射将流嵌入所有离散联合分布的元单纯形中,确保了原则上可表示任意目标分布,其实际复杂度仅取决于动态分配流系统亲和函数的参数化。我们的模型无需积分即可通过条件黎曼流匹配以无模拟方式训练,利用封闭形式的信息几何e-连接下的测地线对训练数据进行编码。通过将联合分布元单纯形中的高维流匹配投影到因子化分布子流形上,本方法从耦合离散变量建模的第一性原理出发具有强动机。针对结构化图像标注分布的数值实验证明了该方法在大规模问题中的适用性,这些问题可能涵盖其他应用领域的离散分布。性能评估表明,与近期相关工作相比,本方法随类别数量增加具有更好的可扩展性。