Models that rely solely on pairwise relationships often fail to capture the complete statistical structure of the complex multivariate data found in diverse domains, such as socio-economic, ecological, or biomedical systems. Non-trivial dependencies between groups of more than two variables can play a significant role in the analysis and modelling of such systems, yet extracting such high-order interactions from data remains challenging. Here, we introduce a hierarchy of $d$-order ($d \geq 2$) interaction measures, increasingly inclusive of possible factorisations of the joint probability distribution, and define non-parametric, kernel-based tests to establish systematically the statistical significance of $d$-order interactions. We also establish mathematical links with lattice theory, which elucidate the derivation of the interaction measures and their composite permutation tests; clarify the connection of simplicial complexes with kernel matrix centring; and provide a means to enhance computational efficiency. We illustrate our results numerically with validations on synthetic data, and through an application to neuroimaging data.
翻译:仅依赖两两关系的模型往往难以捕捉复杂多元数据(如社会经济、生态或生物医学系统)中完整的统计结构。超过两个变量组之间的非平凡依赖关系对此类系统的分析与建模具有重要作用,然而从数据中提取此类高阶交互作用仍具挑战性。本文提出一组d阶(d≥2)交互作用度量层级体系,该体系逐步涵盖联合概率分布的所有可能分解形式,并定义基于核的非参数检验方法,以系统性地判定d阶交互作用的统计显著性。我们还建立了与格论之间的数学联系,阐明交互作用度量及其复合置换检验的推导过程;揭示单纯复形与核矩阵中心化之间的关联;并提供提升计算效率的途径。通过合成数据的数值验证及神经影像数据的应用实例,我们对结果进行了展示说明。