Testing differences in mean vectors is a fundamental task in the analysis of high-dimensional compositional data. Existing methods may suffer from low power if the underlying signal pattern is in a situation that does not favor the deployed test. In this work, we develop two-sample power-enhanced mean tests for high-dimensional compositional data based on the combination of $p$-values, which integrates strengths from two popular types of tests: the maximum-type test and the quadratic-type test. We provide rigorous theoretical guarantees on the proposed tests, showing accurate Type-I error rate control and enhanced testing power. Our method boosts the testing power towards a broader alternative space, which yields robust performance across a wide range of signal pattern settings. Our theory also contributes to the literature on power enhancement and Gaussian approximation for high-dimensional hypothesis testing. We demonstrate the performance of our method on both simulated data and real-world microbiome data, showing that our proposed approach improves the testing power substantially compared to existing methods.
翻译:检验均值向量差异是高维成分数据分析中的基础任务。若潜在信号模式与所采用的检验方法不匹配,现有方法的检验功效可能较低。本文基于$p$值组合方法,发展了两样本高维成分数据的增强型均值检验,该方法融合了两类主流检验方法的优势:最大值型检验与二次型检验。我们为所提出的检验方法提供了严格的理论保障,证明了其对第一类错误率的精确控制以及增强的检验功效。该方法可将检验功效扩展至更广泛的备择空间,从而在多种信号模式设定下均能保持稳健性能。本文的理论结果还对高维假设检验的增强功效理论与高斯近似方法做出了贡献。通过模拟数据和真实微生物组数据的实验,我们证明了所提方法相较于现有方法能够显著提升检验功效。