The Certainty Bound: Structural Limits on Scientific Reliability

Explanations of the replication crisis often emphasize misconduct, questionable research practices, or incentive misalignment, implying that behavioral reform is sufficient. This paper argues that a substantial component is architectural: within binary significance-based publication systems, even perfectly diligent researchers face structural limits on the reliability they can deliver. The posterior log-odds of a finding equal prior log-odds plus log(Lambda), where Lambda = (1-beta)/alpha is the experimental leverage. Interpreted architecturally, this implies a hard constraint: once evidence is coarsened to a binary significance decision, the decision rule contributes exactly log(Lambda) to posterior log-odds. A target reliability tau is feasible iff pi >= pi_crit, and under fixed alpha this generally cannot be rescued by sample size alone. Two mechanisms can drive effective leverage to 1 without bad faith: persistent unmeasured confounding in observational studies and unbounded specification search under publication pressure. These results concern binary significance-based decision architectures and do not bound inference based on full likelihoods or richer continuous evidence summaries. Two collapse results formalize these mechanisms, while the Replication Pipeline Theorem and Minimum Pipeline Depth Corollary identify a quantitative evidentiary standard for escape. Using independently documented parameters for pre-reform psychology (pi about 0.10, power about 0.35), the framework implies a replication rate of 36%, consistent with the Open Science Collaboration. The framework also provides quantitative bridges to Popper, Kuhn, and Lakatos. In low-prior settings below the single-study feasibility threshold, the natural unit of evidence is the replication pipeline rather than the individual experiment.

翻译：对可重复性危机的解释通常强调不当行为、可疑研究实践或激励错位，暗示行为改革即可解决问题。本文认为一个重要的因素是结构性的：在基于二元显著性的发表体系中，即使完全严谨的研究者也会面临其所能提供可靠性的结构性限制。研究发现的后验对数几率等于先验对数几率加上 log(Λ)，其中 Λ = (1-β)/α 为实验杠杆。从结构角度解读，这意味着一个硬性约束：一旦证据被粗化为二元显著性判定，判定规则对后验对数几率的贡献恰好为 log(Λ)。目标可靠性 τ 可行的充要条件是 π ≥ π_crit，且在固定 α 条件下通常无法仅通过样本量来挽救。两种机制可在无恶意的情况下将有效杠杆推至 1：观察性研究中持续存在的未测量混杂因素，以及发表压力下无限制的模型设定搜索。这些结果针对基于二元显著性的判定架构，并不限制基于完整似然或更丰富连续证据汇总的推断。两个崩溃结果形式化了这些机制，而可重复性管道定理与最小管道深度推论则为突破该限制确定了量化证据标准。使用改革前心理学独立记录的参数（π 约 0.10，功效约 0.35），该框架推得 36% 的可重复率，与开放科学协作项目的结果一致。该框架还为连接波普尔、库恩和拉卡托斯的理论提供了量化桥梁。在低于单研究可行性阈值的低先验概率情境中，证据的自然单位应是可重复性管道而非单个实验。