Completely random measures (CRMs) and their normalizations (NCRMs) offer flexible models in Bayesian nonparametrics. But their infinite dimensionality presents challenges for inference. Two popular finite approximations are truncated finite approximations (TFAs) and independent finite approximations (IFAs). While the former have been well-studied, IFAs lack similarly general bounds on approximation error, and there has been no systematic comparison between the two options. In the present work, we propose a general recipe to construct practical finite-dimensional approximations for homogeneous CRMs and NCRMs, in the presence or absence of power laws. We call our construction the automated independent finite approximation (AIFA). Relative to TFAs, we show that AIFAs facilitate more straightforward derivations and use of parallel computing in approximate inference. We upper bound the approximation error of AIFAs for a wide class of common CRMs and NCRMs -- and thereby develop guidelines for choosing the approximation level. Our lower bounds in key cases suggest that our upper bounds are tight. We prove that, for worst-case choices of observation likelihoods, TFAs are more efficient than AIFAs. Conversely, we find that in real-data experiments with standard likelihoods, AIFAs and TFAs perform similarly. Moreover, we demonstrate that AIFAs can be used for hyperparameter estimation even when other potential IFA options struggle or do not apply.
翻译:完全随机测度及其归一化形式为贝叶斯非参数推断提供了灵活的模型,但其无限维特性给推理带来挑战。两种常用的有限近似方法是截断有限近似和独立有限近似。前者已被充分研究,但独立有限近似在近似误差方面缺乏类似的通用界值,且二者之间尚无系统性的比较。本文提出了一种通用构建方法,用于在存在或不存在幂律特性的情况下,构造适用于齐次完全随机测度及其归一化形式的实用有限维近似,我们称之为自动化独立有限近似。与截断有限近似相比,自动化独立有限近似在近似推理中能够简化推导过程并便于使用并行计算。我们给出了自动化独立有限近似在广泛常见完全随机测度及归一化形式上的近似误差上界,并据此制定了近似程度选择准则。关键案例中的下界表明我们的上界是紧致的。我们证明,在最坏情况下的观测似然函数选择中,截断有限近似的效率优于自动化独立有限近似。然而在采用标准似然函数的真实数据实验中,两者表现相当。此外,我们证明即使在其他潜在独立有限近似方案失效或无法应用的场景中,自动化独立有限近似仍能用于超参数估计。