Flow-based models typically define a latent space with dimensionality identical to the observational space. In many problems, however, the data does not populate the full ambient data space that they natively reside in, rather inhabiting a lower-dimensional manifold. In such scenarios, flow-based models are unable to represent data structures exactly as their densities will always have support off the data manifold, potentially resulting in degradation of model performance. To address this issue, we propose to learn a manifold prior for flow models that leverage the recently proposed spread divergence towards fixing the crucial problem; the KL divergence and maximum likelihood estimation are ill-defined for manifold learning. In addition to improving both sample quality and representation quality, an auxiliary benefit enabled by our approach is the ability to identify the intrinsic dimension of the manifold distribution.
翻译:基于流的模型通常定义与观测空间维度相同的潜在空间。然而在许多问题中,数据并不占据其原始存在的完整环境数据空间,而是栖息于低维流形上。在此类场景下,流模型无法精确表示数据结构,因为其密度始终在数据流形之外具有支撑集,可能导致模型性能下降。为解决该问题,我们提出为流模型学习流形先验,利用近期提出的扩散散度来修正关键问题:KL散度和最大似然估计在流形学习中缺乏良好定义。除了提升样本质量和表示质量外,我们的方法还带来一项辅助优势——能够识别流形分布的内在维度。