There are two distinct definitions of 'P-value' for evaluating a proposed hypothesis or model for the process generating an observed dataset. The original definition starts with a measure of the divergence of the dataset from what was expected under the model, such as a sum of squares or a deviance statistic. A P-value is then the ordinal location of the measure in a reference distribution computed from the model and the data, and is treated as a unit-scaled index of compatibility between the data and the model. In the other definition, a P-value is a random variable on the unit interval whose realizations can be compared to a cutoff alpha to generate a decision rule with known error rates under the model and specific alternatives. It is commonly assumed that realizations of such decision P-values always correspond to divergence P-values. But this need not be so: Decision P-values can violate intuitive single-sample coherence criteria where divergence P-values do not. It is thus argued that divergence and decision P-values should be carefully distinguished in teaching, and that divergence P-values are the relevant choice when the analysis goal is to summarize evidence rather than implement a decision rule.
翻译:针对评估观察数据集生成过程所提出的假设或模型,“P值”存在两种截然不同的定义。原始定义始于衡量数据与模型预期之间偏离程度的指标,例如平方和或偏差统计量。P值随后是该指标在由模型和数据计算得出的参考分布中的序数位置,被视为数据与模型之间兼容性的单位尺度化指标。在另一种定义中,P值是一个在单位区间上的随机变量,其实现值可与截断值α进行比较,以生成在模型及特定备择假设下具有已知错误率的决策规则。通常假设此类决策P值的实现值始终对应于分歧P值。但情况未必如此:决策P值可能违反直观的单样本一致性准则,而分歧P值则不会。因此,本文主张在教学与实践中应仔细区分分歧P值和决策P值,当分析目标是总结证据而非实施决策规则时,分歧P值是相关选择。