Connecting Simple and Precise P-values to Complex and Ambiguous Realities

from arxiv, 26 pages. Appears with comments in Scandinavian Journal of Statistics 2023, issue 3. Main article: Greenland, S. (2023). Divergence vs. decision P-values: A distinction worth making in theory and keeping in practice. Scandinavian Journal of Statistics, 50, 1-35, corrected version at arXiv:2301.02478

Mathematics is a limited component of solutions to real-world problems, as it expresses only what is expected to be true if all our assumptions are correct, including implicit assumptions that are omnipresent and often incorrect. Statistical methods are rife with implicit assumptions whose violation can be life-threatening when results from them are used to set policy. Among them are that there is human equipoise or unbiasedness in data generation, management, analysis, and reporting. These assumptions correspond to levels of cooperation, competence, neutrality, and integrity that are absent more often than we would like to believe. Given this harsh reality, we should ask what meaning, if any, we can assign to the P-values, 'statistical significance' declarations, 'confidence' intervals, and posterior probabilities that are used to decide what and how to present (or spin) discussions of analyzed data. By themselves, P-values and CI do not test any hypothesis, nor do they measure the significance of results or the confidence we should have in them. The sense otherwise is an ongoing cultural error perpetuated by large segments of the statistical and research community via misleading terminology. So-called 'inferential' statistics can only become contextually interpretable when derived explicitly from causal stories about the real data generator (such as randomization), and can only become reliable when those stories are based on valid and public documentation of the physical mechanisms that generated the data. Absent these assurances, traditional interpretations of statistical results become pernicious fictions that need to be replaced by far more circumspect descriptions of data and model relations.

翻译：数学是解决现实世界问题的一个有限组成部分，因为它仅表达了在我们所有假设（包括那些普遍存在且常为错误的隐含假设）正确的前提下预期为真的内容。统计方法充斥着隐含假设，当基于其结果制定政策时，若违反这些假设，可能危及生命。其中包括：数据生成、管理、分析和报告中存在人类均衡性或无偏性。这些假设对应着合作、能力、中立与诚信的层次，但现实中这些要素缺失的程度往往超出我们的愿知。面对这一严酷现实，我们应思考：用于决定数据分析结果如何呈现（或歪曲）的p值、“统计显著性”声明、“置信”区间和后验概率，究竟能赋予何种意义（若确实存在）？p值和置信区间本身既不检验任何假设，也不衡量结果的显著性或我们应对其抱有的信心。与之相反的看法，是统计与研究界大部分群体通过误导性术语持续维系的文化谬误。所谓的“推断”统计，唯有在明确源自关于真实数据生成器（如随机化）的因果故事时，才能获得情境化可解释性；且唯有当这些故事基于生成数据的物理机制的有效公开文档时，才能变得可靠。缺乏这些保障，传统对统计结果的解释便成为有害的虚构，需代之以对数据与模型关系更为审慎的描述。