Models that rely solely on pairwise relationships often fail to capture the complete statistical structure of the complex multivariate data found in diverse domains, such as socio-economic, ecological, or biomedical systems. Non-trivial dependencies between groups of more than two variables can play a significant role in the analysis and modelling of such systems, yet extracting such high-order interactions from data remains challenging. Here, we introduce a hierarchy of $d$-order ($d \geq 2$) interaction measures, increasingly inclusive of possible factorisations of the joint probability distribution, and define non-parametric, kernel-based tests to establish systematically the statistical significance of $d$-order interactions. We also establish mathematical links with lattice theory, which elucidate the derivation of the interaction measures and their composite permutation tests; clarify the connection of simplicial complexes with kernel matrix centring; and provide a means to enhance computational efficiency. We illustrate our results numerically with validations on synthetic data, and through an application to neuroimaging data.
翻译:仅依赖两两关系的模型往往无法捕捉复杂多变量数据(如社会经济、生态或生物医学系统)中的完整统计结构。大于两个变量组之间的非平凡依赖关系对此类系统的分析与建模可能至关重要,但从数据中提取这种高阶交互仍具挑战性。本文引入了一族$d$阶($d \geq 2$)交互度量(逐步包容联合概率分布的所有可能分解),并定义了基于核的非参数检验方法,以系统性地确立$d$阶交互的统计显著性。我们还建立了与格理论的数学联系:阐明了交互度量及其复合置换检验的推导过程;澄清了单纯复形与核矩阵中心化之间的关联;并提供了提升计算效率的途径。通过合成数据的数值验证以及神经影像数据的应用,我们展示了上述结果的实效性。