Comparing statistical likelihoods with diagnostic probabilities based on directly observed proportions to help understand the replication crisis

Diagnosticians use an observed proportion as a direct estimate of the posterior probability of a diagnosis. Therefore, a diagnostician might regard a continuous Gaussian probability distribution of possible numerical outcomes conditional on the information in the study methods and data as posterior probabilities. Similarly, they might regard the distribution of possible means based on a SEM as a posterior probability distribution too. If the converse likelihood distribution of the observed mean conditional on any hypothetical mean (e.g. the null hypothesis) is assumed to be the same as the above posterior distribution (as is customary) then by Bayes rule, the prior distribution of all possible hypothetical means is uniform. It follows that the probability Q of any theoretically true mean falling into a tail beyond a null hypothesis would be equal to that tails area as a proportion of the whole. It also follows that the P value (the probability of the observed mean or something more extreme conditional on the null hypothesis) is equal to Q. Replication involves doing two independent studies, thus doubling the variance for the combined posterior probability distribution. So, if the original effect size was 1.96, the number of observations was 100, the SEM was 1 and the original P value was 0.025, the theoretical probability of a replicating study getting a P value of up to 0.025 again is only 0.283. By applying this double variance to achieve a power of 80%, the required number of observations is doubled compared to conventional approaches. If some replicating study is to achieve a P value of up to 0.025 yet again with a probability of 0.8, then this requires 3 times as many observations in the power calculation. This might explain the replication crisis.

翻译：诊断医师将观察到的比例直接视为诊断后验概率的估计值。因此，诊断医师可能将基于研究方法与数据信息得到的连续高斯概率分布（反映可能数值结果的分布）视为后验概率。类似地，他们也可能将基于结构方程模型（SEM）的可能均值分布视为后验概率分布。若假定观察均值（在任意假设均值如零假设条件下）的逆似然分布与上述后验分布相同（按惯例如此），则根据贝叶斯规则，所有可能假设均值的先验分布是均匀的。由此推得，任意理论真值落入零假设尾部区域的概率Q等于该尾部面积占整体比例。同时，P值（零假设条件下观察均值或更极端值出现的概率）与Q相等。复制研究需进行两项独立实验，因此合并后验概率分布的方差翻倍。例如，若原始效应量为1.96，观察次数为100，标准误（SEM）为1，原始P值为0.025，则复制研究再次获得P值≤0.025的理论概率仅为0.283。若通过此双倍方差实现80%的统计功效，所需观察次数是传统方法的两倍。若希望复制研究以0.8的概率再次获得P值≤0.025，则功效计算中需采用三倍观察次数。这或许能解释复制危机现象。