Probabilities of the Third Type: Statistical Relational Learning and Reasoning with Relative Frequencies

Dependencies on the relative frequency of a state in the domain are common when modelling probabilistic dependencies on relational data. For instance, the likelihood of a school closure during an epidemic might depend on the proportion of infected pupils exceeding a threshold. Often, rather than depending on discrete thresholds, dependencies are continuous: for instance, the likelihood of any one mosquito bite transmitting an illness depends on the proportion of carrier mosquitoes. Current approaches usually only consider probabilities over possible worlds rather than over domain elements themselves. An exception are the recently introduced lifted Bayesian networks for conditional probability logic, which express discrete dependencies on probabilistic data. We introduce functional lifted Bayesian networks, a formalism that explicitly incorporates continuous dependencies on relative frequencies into statistical relational artificial intelligence, and compare and contrast them with lifted Bayesian networks for conditional probability logic. Incorporating relative frequencies is not only beneficial to modelling; it also provides a more rigorous approach to learning problems where training and test or application domains have different sizes. To this end, we provide a representation of the asymptotic probability distributions induced by functional lifted Bayesian networks on domains of increasing sizes. Since that representation has well-understood scaling behaviour across domain sizes, it can be used to estimate parameters for a large domain consistently from randomly sampled subpopulations. Furthermore, we show that in parametric families of FLBN, convergence is uniform in the parameters, which ensures a meaningful dependence of the asymptotic probabilities on the parameters of the model.

翻译：在关系数据的概率依赖建模中，对领域内状态相对频率的依赖十分常见。例如，疫情期间学校关闭的可能性可能取决于感染学生比例是否超过某个阈值。通常，这种依赖关系是连续而非离散的：例如，单次蚊虫叮咬传播疾病的概率取决于携带病原体蚊子的比例。现有方法通常仅考虑可能世界上的概率，而非领域元素本身的概率。近期提出的条件概率逻辑提升贝叶斯网络是个例外，它能表达对概率数据的离散依赖关系。本文提出函数式提升贝叶斯网络，该形式化方法将连续相对频率依赖明确纳入统计关系人工智能体系，并与条件概率逻辑的提升贝叶斯网络进行对比分析。引入相对频率不仅有利于建模，还为训练域与测试域（或应用域）规模不同的学习问题提供了更严谨的解决途径。为此，我们给出了函数式提升贝叶斯网络在递增领域规模上诱导的渐近概率分布表示。由于该表示在不同领域规模间具有明确的缩放特性，可用于通过随机采样子群体一致地估计大领域参数。此外，我们证明在FLBN的参数族中，收敛性关于参数具有一致性，这确保了渐近概率对模型参数存在有意义的依赖关系。