Thousands of experiments are analyzed and papers are published each year involving the statistical analysis of grouped data. While this area of statistics is often perceived -- somewhat naively -- as saturated, several misconceptions still affect everyday practice, and new frontiers have so far remained unexplored. Researchers must be aware of the limitations affecting their analyses and what are the new possibilities in their hands. Motivated by this need, the article introduces a unifying approach to the analysis of grouped data, which allows us to study the class of divisible statistics -- that includes Pearson's $χ^2$, the likelihood ratio as special cases -- with a fresh perspective. The contributions collected in this manuscript span from modeling and estimation to distribution-free goodness-of-fit tests. Perhaps the most surprising result presented here is that, in a sparse regime, all tests proposed in the literature are dominated by members of the class of weighted linear statistics.
翻译:每年有数以千计的实验涉及分组数据的统计分析,相关论文亦层出不穷。尽管该统计领域常被——略显天真地——视为已趋饱和,但日常实践中仍存在若干误解,且新的前沿方向迄今尚未得到探索。研究者必须认识到其分析方法所受的局限,并了解当前可用的新可能性。基于这一需求,本文提出了一种统一的分组数据分析框架,使我们能以全新视角研究可分解统计量类别——该类别包含皮尔逊χ²、似然比统计量等特例。本文汇集的研究贡献涵盖从建模估计到无分布拟合优度检验的多个层面。或许其中最令人惊异的结论是:在稀疏数据条件下,文献中提出的所有检验方法均被加权线性统计量类中的某些成员所主导。