Thousands of experiments are analyzed, and papers are published each year involving the statistical analysis of grouped data. While this area of statistics is often perceived -- somewhat naively -- as saturated, several misconceptions still affect everyday practice, and new frontiers have so far remained unexplored. Researchers must be aware of the limitations affecting their analyses and what new possibilities are at their hands. The article introduces a unifying approach to the analysis of divisible statistics -- that includes Pearson's $χ^2$, the likelihood ratio, and spectral statistics, as special cases -- when a statistician deals with a large number of bins/groups, thus leading to a large number of small or moderate frequencies. Performance of the tests is analyzed against the class of contiguous (local) alternatives. Perhaps the most surprising result here is that, in this `sparse' regime, most of the tests proposed in the literature can be modified to produce more powerful tests, and no single test based on a divisible statistic leads to a goodness-of-fit test. Distribution-free goodness-of-fit tests are also constructed.
翻译:每年有数以千计的实验被分析,并有大量论文发表,涉及分组数据的统计分析。尽管这一统计领域常被——略显天真地——视为已饱和,但若干误解仍影响着日常实践,而新的前沿至今未得到探索。研究人员必须意识到其分析中存在的局限性,以及他们手中可用的新可能性。本文引入了一种统一的分析方法,用于处理可分解统计量——包括皮尔逊$χ^2$、似然比和谱统计量作为特例——当统计学家面对大量区间/组别时,这导致大量小或中等频率的出现。检验性能针对相邻(局部)备择假设类别进行了分析。这里最令人惊讶的结果或许是,在这种“稀疏”情形下,文献中提出的大多数检验可以被修改以产生更有力的检验,而没有任何基于可分解统计量的单一检验能构成拟合优度检验。本文还构建了无分布拟合优度检验。