Nonparametric tests for functional data are a challenging class of tests to work with because of the potentially high dimensional nature of the data. One of the main challenges for considering rank-based tests, like the Mann-Whitney or Wilcoxon Rank Sum tests (MWW), is that the unit of observation is typically a curve. Thus any rank-based test must consider ways of ranking curves. While several procedures, including depth-based methods, have recently been used to create scores for rank-based tests, these scores are not constructed under the null and often introduce additional, uncontrolled for variability. We therefore reconsider the problem of rank-based tests for functional data and develop an alternative approach that incorporates the null hypothesis throughout. Our approach first ranks realizations from the curves at each measurement occurrence, then calculates a summary statistic for the ranks of each subject, and finally re-ranks the summary statistic in a procedure we refer to as a doubly ranked test. We propose two summaries for the middle step: a sufficient statistic and the average rank. As we demonstrate, doubly rank tests are more powerful while maintaining ideal type I error in the two sample, MWW setting. We also extend our framework to more than two samples, developing a Kruskal-Wallis test for functional data which exhibits good test characteristics as well. Finally, we illustrate the use of doubly ranked tests in functional data contexts from material science, climatology, and public health policy.
翻译:函数型数据的非参数检验是一类具有挑战性的检验方法,因为数据可能具有高维特性。对于考虑基于秩次的检验(如Mann-Whitney或Wilcoxon秩和检验(MWW)),主要挑战之一在于观测单位通常是曲线。因此,任何基于秩次的检验都必须考虑对曲线进行排序的方法。尽管包括深度方法在内的若干程序最近已被用于构建基于秩次检验的评分,但这些评分并非在原假设下构建,且常会引入额外的、不可控的变异性。因此,我们重新审视函数型数据的秩次检验问题,并开发了一种将原假设贯穿始终的替代方法。我们的方法首先在每个测量点对曲线实现值进行排序,然后计算每个受试者秩次的汇总统计量,最后对这些汇总统计量进行重新排序——这一过程我们称之为双重秩次检验。针对中间步骤,我们提出了两种汇总方式:充分统计量与平均秩次。如我们所示,在双样本MWW设定下,双重秩次检验在保持理想的第一类错误率的同时具有更高的检验功效。我们还将该框架扩展至两个以上样本的情况,构建了适用于函数型数据的Kruskal-Wallis检验,该检验同样展现出良好的检验特性。最后,我们通过材料科学、气候学和公共卫生政策中的函数型数据案例,展示了双重秩次检验的应用。