Traditional models reliant solely on pairwise associations often prove insufficient in capturing the complex statistical structure inherent in multivariate data. Yet existing methods for identifying information shared among groups of $d>3$ variables are often intractable; asymmetric around a target variable; or unable to consider all factorisations of the joint probability distribution. Here, we present a framework that systematically derives high-order measures using lattice and operator function pairs, whereby the lattice captures the algebraic relational structure of the variables and the operator function computes measures over the lattice. We show that many existing information-theoretic high-order measures can be derived by using divergences as operator functions on sublattices of the partition lattice, thus preventing the accurate quantification of all interactions for $d>3$. Similarly, we show that using the KL divergence as the operator function also leads to unwanted cancellation of interactions for $d>3$. To characterise all interactions among $d$ variables, we introduce the Streitberg information defined on the full partition lattice using generalisations of the KL divergence as operator functions. We validate our results numerically on synthetic data, and illustrate the use of the Streitberg information through applications to stock market returns and neural electrophysiology data.
翻译:传统仅依赖成对关联的模型往往不足以捕捉多元数据固有的复杂统计结构。然而,现有识别 $d>3$ 个变量间共享信息的方法通常难以处理、围绕目标变量不对称,或无法考虑联合概率分布的所有分解形式。本文提出一个系统推导高阶测度的框架,该框架利用格与算子函数对,其中格捕获变量的代数关系结构,算子函数计算格上的测度。我们证明,许多现有的信息论高阶测度可通过在划分格的子格上使用散度作为算子函数导出,但这导致无法准确量化 $d>3$ 时的所有交互作用。类似地,我们证明使用KL散度作为算子函数同样会引起 $d>3$ 时交互作用的不当抵消。为刻画 $d$ 个变量间的所有交互作用,我们在完整划分格上引入Streitberg信息,其使用KL散度的推广形式作为算子函数。我们在合成数据上数值验证了结果,并通过股票市场收益率和神经电生理数据的应用展示了Streitberg信息的用途。