Kernel-based testing has revolutionized the field of non-parametric tests through the embedding of distributions in an RKHS. This strategy has proven to be powerful and flexible, yet its applicability has been limited to the standard two-sample case, while practical situations often involve more complex experimental designs. To extend kernel testing to any design, we propose a linear model in the RKHS that allows for the decomposition of mean embeddings into additive functional effects. We then introduce a truncated kernel Hotelling-Lawley statistic to test the effects of the model, demonstrating that its asymptotic distribution is chi-square, which remains valid with its Nystrom approximation. We discuss a homoscedasticity assumption that, although absent in the standard two-sample case, is necessary for general designs. Finally, we illustrate our framework using a single-cell RNA sequencing dataset and provide kernel-based generalizations of classical diagnostic and exploration tools to broaden the scope of kernel testing in any experimental design.
翻译:基于核的检验通过将分布嵌入再生核希尔伯特空间,彻底革新了非参数检验领域。该策略已被证明是强大且灵活的,但其适用性一直局限于标准的两样本情形,而实际场景往往涉及更复杂的实验设计。为了将核检验推广至任意设计,我们提出了一种在再生核希尔伯特空间中的线性模型,该模型允许将均值嵌入分解为可加的函数效应。随后,我们引入了一种截断核霍特林-劳利统计量来检验模型中的效应,并证明其渐近分布为卡方分布,且该性质在其尼斯特罗姆近似下依然成立。我们讨论了一个同方差性假设,该假设在标准的两样本情形中并不存在,但对于一般设计是必要的。最后,我们通过一个单细胞RNA测序数据集展示了所提出的框架,并提供了经典诊断与探索工具的基于核的泛化形式,从而拓宽了核检验在任何实验设计中的适用范围。