We propose to quantify dependence between two systems $X$ and $Y$ in a dataset $D$ based on the Bayesian comparison of two models: one, $H_0$, of statistical independence and another one, $H_1$, of dependence. In this framework, dependence between $X$ and $Y$ in $D$, denoted $B(X,Y|D)$, is quantified as $P(H_1|D)$, the posterior probability for the model of dependence given $D$, or any strictly increasing function thereof. It is therefore a measure of the evidence for dependence between $X$ and $Y$ as modeled by $H_1$ and observed in $D$. We review several statistical models and reconsider standard results in the light of $B(X,Y|D)$ as a measure of dependence. Using simulations, we focus on two specific issues: the effect of noise and the behavior of $B(X,Y|D)$ when $H_1$ has a parameter coding for the intensity of dependence. We then derive some general properties of $B(X,Y|D)$, showing that it quantifies the information contained in $D$ in favor of $H_1$ versus $H_0$. While some of these properties are typical of what is expected from a valid measure of dependence, others are novel and naturally appear as desired features for specific measures of dependence, which we call inferential. We finally put these results in perspective; in particular, we discuss the consequences of using the Bayesian framework as well as the similarities and differences between $B(X,Y|D)$ and mutual information.
翻译:我们提出一种基于两个模型的贝叶斯比较来量化数据集$D$中两个系统$X$与$Y$之间依赖性的方法:一个模型$H_0$表征统计独立性,另一个模型$H_1$表征依赖性。在此框架下,$D$中$X$与$Y$的依赖性(记为$B(X,Y|D)$)被量化为给定$D$时依赖模型的后验概率$P(H_1|D)$或其任意严格递增函数。因此,该度量表征了$H_1$所建模且在$D$中观测到的$X$与$Y$之间依赖关系的证据强度。我们回顾了若干统计模型,并以$B(X,Y|D)$作为依赖性度量的视角重新审视了经典结论。通过模拟实验,我们重点关注两个特定问题:噪声的影响,以及当$H_1$包含编码依赖性强度的参数时$B(X,Y|D)$的行为特征。随后我们推导出$B(X,Y|D)$的若干一般性质,证明其可量化$D$中支持$H_1$相对于$H_0$的信息量。其中部分性质符合有效依赖性度量的典型预期,另一些性质则具有新颖性,并自然呈现为特定依赖性度量(我们称之为推断性度量)的理想特征。最后我们将这些结果置于更广阔的视角进行讨论:特别探讨了采用贝叶斯框架的推论,以及$B(X,Y|D)$与互信息之间的异同。