Obtaining a reliable estimate of the joint probability mass function (PMF) of a set of random variables from observed data is a significant objective in statistical signal processing and machine learning. Modelling the joint PMF as a tensor that admits a low-rank canonical polyadic decomposition (CPD) has enabled the development of efficient PMF estimation algorithms. However, these algorithms require the rank (model order) of the tensor to be specified beforehand. In real-world applications, the true rank is unknown. Therefore, an appropriate rank is usually selected from a candidate set either by observing validation errors or by computing various likelihood-based information criteria, a procedure that could be costly in terms of computational time or hardware resources, or could result in mismatched models which affect the model accuracy. This paper presents a novel Bayesian framework for estimating the low-rank components of a joint PMF tensor and simultaneously inferring its rank from the observed data. We specify a Bayesian PMF estimation model and employ appropriate prior distributions for the model parameters, allowing the rank to be inferred without cross-validation.We then derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Numerical experiments involving both synthetic data and real classification and item recommendation data illustrate the advantages of our VI-based method in terms of estimation accuracy, automatic rank detection, and computational efficiency.
翻译:从观测数据中获取一组随机变量联合概率质量函数(PMF)的可靠估计是统计信号处理与机器学习领域的重要目标。将联合PMF建模为容许低秩典型多线性分解(CPD)的张量,已催生出高效的PMF估计算法。然而,这些算法需要预先指定张量的秩(模型阶数)。在实际应用中,真实秩值往往未知。通常需要从候选集中选取适当秩值,其方法或是通过观察验证误差,或是计算各类基于似然的信息准则,这一过程可能消耗大量计算时间或硬件资源,甚至可能导致模型失配而影响估计精度。本文提出一种新颖的贝叶斯框架,能够从观测数据中同时估计联合PMF张量的低秩分量并推断其秩值。我们建立了贝叶斯PMF估计模型,并为模型参数设定适当的先验分布,从而无需交叉验证即可推断秩值。随后基于变分推理(VI)推导确定性解法,以逼近各类模型参数的后验分布。通过合成数据及真实分类与物品推荐数据的数值实验,验证了所提基于VI的方法在估计精度、自动秩检测与计算效率方面的优势。