Probabilistic Circuits (PCs) are a unified framework for tractable probabilistic models that support efficient computation of various probabilistic queries (e.g., marginal probabilities). One key challenge is to scale PCs to model large and high-dimensional real-world datasets: we observe that as the number of parameters in PCs increases, their performance immediately plateaus. This phenomenon suggests that the existing optimizers fail to exploit the full expressive power of large PCs. We propose to overcome such bottleneck by latent variable distillation: we leverage the less tractable but more expressive deep generative models to provide extra supervision over the latent variables of PCs. Specifically, we extract information from Transformer-based generative models to assign values to latent variables of PCs, providing guidance to PC optimizers. Experiments on both image and language modeling benchmarks (e.g., ImageNet and WikiText-2) show that latent variable distillation substantially boosts the performance of large PCs compared to their counterparts without latent variable distillation. In particular, on the image modeling benchmarks, PCs achieve competitive performance against some of the widely-used deep generative models, including variational autoencoders and flow-based models, opening up new avenues for tractable generative modeling. Our code can be found at https://github.com/UCLA-StarAI/LVD.
翻译:概率电路(Probabilistic Circuits, PCs)是一种统一的可处理概率模型框架,支持高效计算各类概率查询(例如边缘概率)。一个关键挑战在于如何扩展PCs以建模大规模高维现实世界数据集:我们观察到,随着PCs参数数量的增加,其性能会立即进入平台期。这一现象表明现有优化器未能充分利用大型PCs的全部表达能力。我们提出通过隐变量蒸馏来突破此瓶颈:利用可处理性较弱但表达能力更强的深度生成模型,为PCs的隐变量提供额外监督。具体而言,我们从基于Transformer的生成模型中提取信息,为PCs的隐变量赋值,从而为PC优化器提供指导。在图像和语言建模基准测试(例如ImageNet和WikiText-2)上的实验表明,与未使用隐变量蒸馏的对照模型相比,隐变量蒸馏显著提升了大型PCs的性能。特别是在图像建模基准测试中,PCs取得了与变分自编码器、基于流的模型等广泛使用的深度生成模型相竞争的性能,为可处理生成建模开辟了新途径。我们的代码可在https://github.com/UCLA-StarAI/LVD获取。