Probabilistic Circuits (PCs) are a general and unified computational framework for tractable probabilistic models that support efficient computation of various inference tasks (e.g., computing marginal probabilities). Towards enabling such reasoning capabilities in complex real-world tasks, Liu et al. (2022) propose to distill knowledge (through latent variable assignments) from less tractable but more expressive deep generative models. However, it is still unclear what factors make this distillation work well. In this paper, we theoretically and empirically discover that the performance of a PC can exceed that of its teacher model. Therefore, instead of performing distillation from the most expressive deep generative model, we study what properties the teacher model and the PC should have in order to achieve good distillation performance. This leads to a generic algorithmic improvement as well as other data-type-specific ones over the existing latent variable distillation pipeline. Empirically, we outperform SoTA TPMs by a large margin on challenging image modeling benchmarks. In particular, on ImageNet32, PCs achieve 4.06 bits-per-dimension, which is only 0.34 behind variational diffusion models (Kingma et al., 2021).
翻译:概率电路(PCs)是一个通用且统一的可计算概率模型计算框架,支持高效执行各种推理任务(如计算边缘概率)。为在复杂现实任务中实现此类推理能力,Liu等人(2022)提出通过潜变量赋值从可计算性较弱但表达能力更强的深度生成模型中蒸馏知识。然而,目前仍不清楚哪些因素能确保该蒸馏过程有效运行。本文通过理论与实证研究发现,PC的性能可超越其教师模型。因此,我们不再从最具表达力的深度生成模型进行蒸馏,而是研究教师模型与PC应具备何种特性以实现良好的蒸馏性能。这一研究不仅带来了对现有潜变量蒸馏流程的通用算法改进,还提出了特定数据类型的优化方案。在实验部分,我们在具有挑战性的图像建模基准测试上大幅超越了现有最优的TPMs(可计算概率模型)。特别地,在ImageNet32数据集上,PC达到了4.06 bits-per-dimension的性能,仅落后于变分扩散模型(Kingma et al., 2021)0.34。