Probabilistic Circuits (PCs) are prominent tractable probabilistic models, allowing for a range of exact inferences. This paper focuses on the main algorithm for training PCs, LearnSPN, a gold standard due to its efficiency, performance, and ease of use, in particular for tabular data. We show that LearnSPN is a greedy likelihood maximizer under mild assumptions. While inferences in PCs may use the entire circuit structure for processing queries, LearnSPN applies a hard method for learning them, propagating at each sum node a data point through one and only one of the children/edges as in a hard clustering process. We propose a new learning procedure named SoftLearn, that induces a PC using a soft clustering process. We investigate the effect of this learning-inference compatibility in PCs. Our experiments show that SoftLearn outperforms LearnSPN in many situations, yielding better likelihoods and arguably better samples. We also analyze comparable tractable models to highlight the differences between soft/hard learning and model querying.
翻译:概率电路(Probabilistic Circuits, PCs)是重要的易处理概率模型,能够支持多种精确推理。本文聚焦于训练PCs的核心算法LearnSPN,该算法因其高效性、优越性能及易用性(尤其适用于表格数据)而被视为黄金标准。我们证明,在温和假设条件下,LearnSPN是一种贪心似然最大化算法。尽管PCs的推理可利用完整电路结构处理查询,但LearnSPN采用硬方法进行学习——在每个求和节点处,数据点按照硬聚类过程仅传播至一个子节点/边。为此,我们提出名为SoftLearn的新型学习流程,通过软聚类过程构建PC,并探究这种学习-推理兼容性对PCs的影响。实验表明,SoftLearn在多数场景中优于LearnSPN,可获得更优的似然值及更佳样本。我们同时对比了同类易处理模型,以阐明软/硬学习与模型查询间的差异。