Nonparametric tests for functional data are a challenging class of tests to work with because of the potentially high dimensional nature of functional data. One of the main challenges for considering rank-based tests, like the Mann-Whitney or Wilcoxon Rank Sum tests (MWW), is that the unit of observation is a curve. Thus any rank-based test must consider ways of ranking curves. While several procedures, including depth-based methods, have recently been used to create scores for rank-based tests, these scores are not constructed under the null and often introduce additional, uncontrolled for variability. We therefore reconsider the problem of rank-based tests for functional data and develop an alternative approach that incorporates the null hypothesis throughout. Our approach first ranks realizations from the curves at each time point, summarizes the ranks for each subject using a sufficient statistic we derive, and finally re-ranks the sufficient statistics in a procedure we refer to as a doubly ranked test. As we demonstrate, doubly rank tests are more powerful while maintaining ideal type I error in the two sample, MWW setting. We also extend our framework to more than two samples, developing a Kruskal-Wallis test for functional data which exhibits good test characteristics as well. Finally, we illustrate the use of doubly ranked tests in functional data contexts from material science, climatology, and public health policy.
翻译:函数数据的非参数检验是一类具有挑战性的检验方法,因其数据潜在的高维特性。基于秩次的检验(如Mann-Whitney或Wilcoxon秩和检验)面临的主要困难在于观测单位为曲线,因此任何秩次检验必须考虑曲线的排序方式。尽管近年来已有若干方法(包括基于深度的技术)被用于构建秩次检验的得分,但这些得分并非在原假设下构造,往往会引入额外的未受控变异性。为此,我们重新审视函数数据秩次检验问题,提出一种贯穿原假设的替代方法。该方法首先对每个时间点上的曲线实现值进行排序,利用我们推导的充分统计量对每个受试者的秩次进行汇总,最后对充分统计量进行二次排序——这一流程被称为双重秩次检验。实验证明,在双样本MWW设定下,双重秩次检验在维持理想第一类错误率的同时具有更高检验功效。我们还将该框架扩展至多样本情形,开发了适用于函数数据的Kruskal-Wallis检验,其同样展现出优良的检验特性。最后,通过材料科学、气候学与公共卫生政策领域的函数数据案例,我们展示了双重秩次检验的实际应用价值。