分组数据分析的统计研究：当皮尔逊χ²及其他可分解统计量不适用于拟合优度检验时 (On the statistical analysis of grouped data: when Pearson $χ^2$ and other divisible statistics are not goodness-of-fit tests) - 专知论文

会员服务 ·

0

分析 · 统计量 · 分解 · 拟合 · 数据分析 ·

On the statistical analysis of grouped data: when Pearson $χ^2$ and other divisible statistics are not goodness-of-fit tests

翻译：分组数据分析的统计研究：当皮尔逊χ²及其他可分解统计量不适用于拟合优度检验时

Sara Algeri,Estate V. Khmaladze

Thousands of experiments are analyzed and papers are published each year involving the statistical analysis of grouped data. While this area of statistics is often perceived -- somewhat naively -- as saturated, several misconceptions still affect everyday practice, and new frontiers have so far remained unexplored. Researchers must be aware of the limitations affecting their analyses and what are the new possibilities in their hands. Motivated by this need, the article introduces a unifying approach to the analysis of grouped data, which allows us to study the class of divisible statistics -- that includes Pearson's $χ^2$, the likelihood ratio as special cases -- with a fresh perspective. The contributions collected in this manuscript span from modeling and estimation to distribution-free goodness-of-fit tests. Perhaps the most surprising result presented here is that, in a sparse regime, all tests proposed in the literature are dominated by members of the class of weighted linear statistics.

翻译：每年有数以千计的实验涉及分组数据的统计分析，相关论文亦层出不穷。尽管该统计领域常被——略显天真地——视为已趋饱和，但日常实践中仍存在若干误解，且新的前沿方向迄今尚未得到探索。研究者必须认识到其分析方法所受的局限，并了解当前可用的新可能性。基于这一需求，本文提出了一种统一的分组数据分析框架，使我们能以全新视角研究可分解统计量类别——该类别包含皮尔逊χ²、似然比统计量等特例。本文汇集的研究贡献涵盖从建模估计到无分布拟合优度检验的多个层面。或许其中最令人惊异的结论是：在稀疏数据条件下，文献中提出的所有检验方法均被加权线性统计量类中的某些成员所主导。

0

相关内容

【剑桥大学博士论文】模型不确定性下的统计假设检验，198页pdf

【剑桥大学博士论文】模型不确定性下的统计假设检验，198页pdf

专知会员服务

26+阅读 · 2023年2月7日

具有组合结构的统计推断和在线算法

具有组合结构的统计推断和在线算法

专知会员服务

12+阅读 · 2022年12月13日

【斯坦福大学博士论文】复杂统计模型中的因果和选择性推理，274页pdf

【斯坦福大学博士论文】复杂统计模型中的因果和选择性推理，274页pdf

专知会员服务

86+阅读 · 2022年9月15日

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

143+阅读 · 2022年4月8日

统计太抽象？这本《统计分析基础》新书图文式为你讲解，91页pdf

统计太抽象？这本《统计分析基础》新书图文式为你讲解，91页pdf

专知会员服务

65+阅读 · 2022年1月14日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【斯坦福经典书】计算机时代的统计推断: 算法、证据和数据科学，493页pdf

【斯坦福经典书】计算机时代的统计推断: 算法、证据和数据科学，493页pdf

专知会员服务

93+阅读 · 2020年6月1日

【经典书】统计学习导论，434页pdf，斯坦福大学

【经典书】统计学习导论，434页pdf，斯坦福大学

专知会员服务

240+阅读 · 2020年4月29日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【电子书】统计学习的要素第二版（The Elements of Statistical Learning）764页PDF免费下载

【电子书】统计学习的要素第二版（The Elements of Statistical Learning）764页PDF免费下载

专知会员服务

137+阅读 · 2019年10月30日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

最新「因果推断Causal Inference」综述论文38页pdf，阿里巴巴、Buffalo、Georgia、Virginia

最新「因果推断Causal Inference」综述论文38页pdf，阿里巴巴、Buffalo、Georgia、Virginia

专知

68+阅读 · 2020年2月11日

经典书「统计学习要素（The Elements of Statistical Learning）」笔记与习题解答，139页pdf

经典书「统计学习要素（The Elements of Statistical Learning）」笔记与习题解答，139页pdf

专知

13+阅读 · 2020年2月9日

【综述】3D数据分类深度学习方法综述，25页论文带你全面了解最新进展

【综述】3D数据分类深度学习方法综述，25页论文带你全面了解最新进展

中国人工智能学会

20+阅读 · 2019年7月17日

一文教你如何处理不平衡数据集（附代码）

一文教你如何处理不平衡数据集（附代码）

大数据文摘

11+阅读 · 2019年6月2日

【机器学习】深入剖析机器学习中的统计思想

【机器学习】深入剖析机器学习中的统计思想

产业智能官

17+阅读 · 2019年1月24日

代谢组学数据分析及多元统计分析与应用培训（11.17-18）

代谢组学数据分析及多元统计分析与应用培训（11.17-18）

外泌体之家

11+阅读 · 2018年10月22日

统计学常用数据类型

统计学常用数据类型

论智

19+阅读 · 2018年7月6日

【入门】数据分析六部曲

【入门】数据分析六部曲

36大数据

18+阅读 · 2017年12月6日

[有意思的数学] 参数估计

[有意思的数学] 参数估计

机器学习和数学

15+阅读 · 2017年6月4日

有效融合多源异构数据的集成分类器研究

国家自然科学基金

5+阅读 · 2015年12月31日

可扩展的蛋白质组学大数据存储与分析模型研究

国家自然科学基金

1+阅读 · 2015年12月31日

多重排序数据的整合分析

国家自然科学基金

0+阅读 · 2015年12月31日

超高维数据中若干检验问题的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于部分核实数据的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于认知计算的大数据分析方法

国家自然科学基金

25+阅读 · 2014年12月31日

关于面板(纵向）数据的动态统计分析

国家自然科学基金

0+阅读 · 2014年12月31日

广义线性模型的组变量选择及其在信用评分中的应用

国家自然科学基金

2+阅读 · 2014年12月31日

代谢组学数据的多层次融合和模型评价方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

因果推断及不完全数据的统计分析

国家自然科学基金

23+阅读 · 2008年12月31日

A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation

Arxiv

0+阅读 · 2月18日

Profiling systematic uncertainties in Simulation-Based Inference with Factorizable Normalizing Flows

Arxiv

0+阅读 · 2月13日

A Bayesian approach to differential prevalence analysis with applications in microbiome studies

Arxiv

0+阅读 · 2月5日

Exploring Collaborative Immersive Visualization & Analytics for High-Dimensional Scientific Data through Domain Expert Perspectives

Arxiv

0+阅读 · 2月4日

Set-valued data analysis for interlaboratory comparisons

Arxiv

0+阅读 · 1月26日

Statistical Analysis of Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss

Arxiv

0+阅读 · 1月23日

Adaptive partition Factor Analysis

Arxiv

0+阅读 · 1月21日

When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models

Arxiv

0+阅读 · 1月21日

Bayesian Time-Varying Meta-Analysis via Hierarchical Mean-Variance Random-effects Models

Arxiv

0+阅读 · 1月16日

Bayesian nonparametric models for zero-inflated count-compositional data using ensembles of regression trees

Arxiv

0+阅读 · 1月12日

VIP会员

文章信息

相关主题

相关VIP内容

【剑桥大学博士论文】模型不确定性下的统计假设检验，198页pdf

【剑桥大学博士论文】模型不确定性下的统计假设检验，198页pdf

专知会员服务

26+阅读 · 2023年2月7日

具有组合结构的统计推断和在线算法

具有组合结构的统计推断和在线算法

专知会员服务

12+阅读 · 2022年12月13日

【斯坦福大学博士论文】复杂统计模型中的因果和选择性推理，274页pdf

【斯坦福大学博士论文】复杂统计模型中的因果和选择性推理，274页pdf

专知会员服务

86+阅读 · 2022年9月15日

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

143+阅读 · 2022年4月8日

统计太抽象？这本《统计分析基础》新书图文式为你讲解，91页pdf

统计太抽象？这本《统计分析基础》新书图文式为你讲解，91页pdf

专知会员服务

65+阅读 · 2022年1月14日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【斯坦福经典书】计算机时代的统计推断: 算法、证据和数据科学，493页pdf

【斯坦福经典书】计算机时代的统计推断: 算法、证据和数据科学，493页pdf

专知会员服务

93+阅读 · 2020年6月1日

【经典书】统计学习导论，434页pdf，斯坦福大学

【经典书】统计学习导论，434页pdf，斯坦福大学

专知会员服务

240+阅读 · 2020年4月29日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【电子书】统计学习的要素第二版（The Elements of Statistical Learning）764页PDF免费下载

【电子书】统计学习的要素第二版（The Elements of Statistical Learning）764页PDF免费下载

专知会员服务

137+阅读 · 2019年10月30日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体记忆深度剖析：评价指标与系统局限性的分类体系及实证分析

《可信人工智能赋能系统的支柱》

【CMU博士论文】可靠轨迹预测的分层基石：数据、评估与方法

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

相关资讯

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

最新「因果推断Causal Inference」综述论文38页pdf，阿里巴巴、Buffalo、Georgia、Virginia

最新「因果推断Causal Inference」综述论文38页pdf，阿里巴巴、Buffalo、Georgia、Virginia

专知

68+阅读 · 2020年2月11日

经典书「统计学习要素（The Elements of Statistical Learning）」笔记与习题解答，139页pdf

经典书「统计学习要素（The Elements of Statistical Learning）」笔记与习题解答，139页pdf

专知

13+阅读 · 2020年2月9日

【综述】3D数据分类深度学习方法综述，25页论文带你全面了解最新进展

【综述】3D数据分类深度学习方法综述，25页论文带你全面了解最新进展

中国人工智能学会

20+阅读 · 2019年7月17日

一文教你如何处理不平衡数据集（附代码）

一文教你如何处理不平衡数据集（附代码）

大数据文摘

11+阅读 · 2019年6月2日

【机器学习】深入剖析机器学习中的统计思想

【机器学习】深入剖析机器学习中的统计思想

产业智能官

17+阅读 · 2019年1月24日

代谢组学数据分析及多元统计分析与应用培训（11.17-18）

代谢组学数据分析及多元统计分析与应用培训（11.17-18）

外泌体之家

11+阅读 · 2018年10月22日

统计学常用数据类型

统计学常用数据类型

论智

19+阅读 · 2018年7月6日

【入门】数据分析六部曲

【入门】数据分析六部曲

36大数据

18+阅读 · 2017年12月6日

[有意思的数学] 参数估计

[有意思的数学] 参数估计

机器学习和数学

15+阅读 · 2017年6月4日

相关论文

A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation

Arxiv

0+阅读 · 2月18日

Profiling systematic uncertainties in Simulation-Based Inference with Factorizable Normalizing Flows

Arxiv

0+阅读 · 2月13日

A Bayesian approach to differential prevalence analysis with applications in microbiome studies

Arxiv

0+阅读 · 2月5日

Exploring Collaborative Immersive Visualization & Analytics for High-Dimensional Scientific Data through Domain Expert Perspectives

Arxiv

0+阅读 · 2月4日

Set-valued data analysis for interlaboratory comparisons

Arxiv

0+阅读 · 1月26日

Statistical Analysis of Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss

Arxiv

0+阅读 · 1月23日

Adaptive partition Factor Analysis

Arxiv

0+阅读 · 1月21日

When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models

Arxiv

0+阅读 · 1月21日

Bayesian Time-Varying Meta-Analysis via Hierarchical Mean-Variance Random-effects Models

Arxiv

0+阅读 · 1月16日

Bayesian nonparametric models for zero-inflated count-compositional data using ensembles of regression trees

Arxiv

0+阅读 · 1月12日

相关基金

有效融合多源异构数据的集成分类器研究

国家自然科学基金

5+阅读 · 2015年12月31日

可扩展的蛋白质组学大数据存储与分析模型研究

国家自然科学基金

1+阅读 · 2015年12月31日

多重排序数据的整合分析

国家自然科学基金

0+阅读 · 2015年12月31日

超高维数据中若干检验问题的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于部分核实数据的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于认知计算的大数据分析方法

国家自然科学基金

25+阅读 · 2014年12月31日

关于面板(纵向）数据的动态统计分析

国家自然科学基金

0+阅读 · 2014年12月31日

广义线性模型的组变量选择及其在信用评分中的应用

国家自然科学基金

2+阅读 · 2014年12月31日

代谢组学数据的多层次融合和模型评价方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

因果推断及不完全数据的统计分析

国家自然科学基金

23+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员