We introduce the use of latent subspaces in the exponential parameter space of product manifolds of categorial distributions, as a tool for learning generative models of discrete data. The low-dimensional latent space encodes statistical dependencies and removes redundant degrees of freedom among the categorial variables. We equip the parameter domain with a Riemannian geometry such that the spaces and distances are related by isometries which enables consistent flow matching. In particular, geodesics become straight lines which makes model training by flow matching effective. Empirical results demonstrate that reduced latent dimensions suffice to represent data for generative modeling.
翻译:本文提出在分类分布乘积流形的指数参数空间中引入潜在子空间,作为学习离散数据生成模型的工具。该低维潜在空间能够编码统计依赖性,并消除分类变量间的冗余自由度。我们为参数域赋予黎曼几何结构,使得空间与距离通过等距映射相关联,从而实现一致的流匹配。特别地,测地线在此几何下转化为直线,这使得通过流匹配进行模型训练更为高效。实验结果表明,降低的潜在维度足以支撑生成建模的数据表示。