We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables. The approach uses measure transport by randomized assignment flows on the statistical submanifold of factorizing distributions, which enables to represent and sample efficiently from any target distribution and to assess the likelihood of unseen data points. The complexity of the target distribution only depends on the parametrization of the affinity function of the dynamical assignment flow system. Our model can be trained in a simulation-free manner by conditional Riemannian flow matching, using the training data encoded as geodesics on the assignment manifold in closed-form, with respect to the e-connection of information geometry. Numerical experiments devoted to distributions of structured image labelings demonstrate the applicability to large-scale problems, which may include discrete distributions in other application areas. Performance measures show that our approach scales better with the increasing number of classes than recent related work.
翻译:本文提出一种新颖的生成模型,用于表示可能包含大量离散随机变量的联合概率分布。该方法通过在因子化分布的统计子流形上建立随机化分配流的测度传输机制,实现了对任意目标分布的高效表示与采样,并能评估未观测数据点的似然度。目标分布的复杂度仅取决于动态分配流系统亲和函数的参数化形式。该模型可通过条件黎曼流匹配进行无模拟训练,其中训练数据以闭式形式编码为分配流形上关于信息几何e-联络的测地线。针对结构化图像标注分布的数值实验验证了该方法在大规模问题中的适用性,其应用范围可延伸至其他领域的离散分布。性能评估表明,随着类别数量的增加,本方法比近期相关研究展现出更优的扩展性。