Hypothesis testing and confidence sets: why Bayesian not frequentist, and how to set a prior with a regulatory authority

from arxiv, 142 pages, 72 figures, 14 tables; v7 has new appendix L on relationship between complete families of critical regions and confidence sets, and an anecdote illustrating the adverse frequentist effects of hard work

We marshall the arguments for preferring Bayesian hypothesis testing and confidence sets to frequentist ones. We define admissible solutions to inference problems, noting that Bayesian solutions are admissible. We give seven weaker common-sense criteria for solutions to inference problems, all failed by these frequentist methods but satisfied by any admissible method. We note that pseudo-Bayesian methods made by handicapping Bayesian methods to satisfy criteria on type I error rate makes them frequentist not Bayesian in nature. We give five examples showing the differences between Bayesian and frequentist methods; the first requiring little calculus, the second showing in abstract what is wrong with these frequentist methods, the third to illustrate information conservation, the fourth to show that the same problems arise in everyday statistical problems, and the fifth to illustrate how on some real-life inference problems Bayesian methods require less data than fixed sample-size (resp. pseudo-Bayesian) frequentist hypothesis testing by factors exceeding 3000 (resp 300) without recourse to informative priors. To address the issue of different parties with opposing interests reaching agreement on a prior, we illustrate the beneficial effects of a Bayesian "Let the data decide" policy both on results under a wide variety of conditions and on motivation to reach a common prior by consent. We show that in general the frequentist confidence level contains less relevant Shannon information than the Bayesian posterior, and give an example where no deterministic frequentist critical regions give any relevant information even though the Bayesian posterior contains up to the maximum possible amount. In contrast use of the Bayesian prior allows construction of non-deterministic critical regions for which the Bayesian posterior can be recovered from the frequentist confidence.

翻译：我们系统论证了为何贝叶斯假设检验和置信集优于频率学派方法。定义了推理问题的可容许解，指出贝叶斯解具有可容许性。提出七条较弱的常识性推理准则，频率学派方法均不满足，而所有可容许方法均能遵循。指出通过施加约束使贝叶斯方法满足第一类错误率准则的伪贝叶斯方法，本质上是频率学派而非贝叶斯方法。通过五个实例展示贝叶斯与频率学派的差异：第一个仅需微积分基础，第二个揭示频率学派方法在抽象层面的缺陷，第三个阐释信息守恒原理，第四个说明日常统计问题中同样存在此类问题，第五个表明在现实推理问题中，贝叶斯方法所需数据量可降至固定样本量（或伪贝叶斯）频率学派假设检验的1/3000（或1/300）以下，且无需依赖信息性先验。针对利益冲突方就先验达成共识的难题，我们通过"让数据说话"的贝叶斯策略，展示了其在多样条件下对结果的积极影响，以及促进各方自愿达成共识先验的激励效应。研究表明，频率学派置信水平包含的香农信息通常少于贝叶斯后验，并给出实例：当贝叶斯后验包含最大可能信息量时，确定性频率学派临界区域却完全无法提供任何有效信息。相比之下，贝叶斯先验可构造非确定性临界区域，使得频率学派置信区间能恢复贝叶斯后验信息。