Nonparametric tests for functional data are a challenging class of tests to work with because of the potentially high dimensional nature of functional data. One of the main challenges for considering rank-based tests, like the Mann-Whitney or Wilcoxon Rank Sum tests (MWW), is that the unit of observation is a curve. Thus any rank-based test must consider ways of ranking curves. While several procedures, including depth-based methods, have recently been used to create scores for rank-based tests, these scores are not constructed under the null and often introduce additional, uncontrolled for variability. We therefore reconsider the problem of rank-based tests for functional data and develop an alternative approach that incorporates the null hypothesis throughout. Our approach first ranks realizations from the curves at each time point, then summarizes the ranks for each subject using a sufficient statistic we derive, and finally re-ranks the sufficient statistics in a procedure we refer to as a doubly ranked test. As we demonstrate, doubly rank tests are more powerful while maintaining ideal type I error in the two sample, MWW setting. We also extend our framework to more than two samples, developing a Kruskal-Wallis test for functional data which exhibits good test characteristics as well. Finally, we illustrate the use of doubly ranked tests in functional data contexts from material science, climatology, and public health policy.
翻译:函数型数据的非参数检验因数据潜在的高维特性而颇具挑战。在基于秩的检验(如曼-惠特尼检验或威尔科克森秩和检验)中,主要难点在于观测单元为曲线,因此任何基于秩的检验都需要考虑曲线的排序方法。尽管近年来包括深度方法在内的若干技术被用于构建基于秩检验的得分,但这些得分并非在原假设下构建,且常引入额外且不可控的变异。为此,我们重新审视函数型数据的基于秩检验问题,提出一种全程纳入原假设的替代方法。该方法首先对每个时间点的曲线实现进行排序,再利用我们推导的充分统计量汇总每个受试者的秩,最后对充分统计量进行二次排序——这一过程称为成对排序检验。研究表明,在双样本曼-惠特尼检验场景下,成对排序检验在保持理想第一类错误率的同时具有更高统计功效。我们还将该框架拓展至多样本场景,开发了适用于函数型数据的克鲁斯卡尔-沃利斯检验,该检验同样展现优良特性。最后,通过材料科学、气候学及公共卫生政策领域的函数型数据实例,我们展示了成对排序检验的实际应用。