Modern science increasingly relies on ever-growing observational datasets and automated inference pipelines, under the implicit belief that accumulating more data makes scientific conclusions more reliable. Here we show that this belief can fail in a fundamental and irreversible way. We identify a structural regime in which standard inference procedures converge smoothly, remain well calibrated, and pass conventional diagnostic checks, yet systematically converge to incorrect conclusions. This failure arises when the reliability of observations degrades in a manner that is intrinsically unobservable to the inference process itself. Using minimal synthetic experiments, we demonstrate that in this regime additional data do not correct error but instead amplify it, while residual-based and goodness-of-fit diagnostics remain misleadingly normal. These results reveal an intrinsic limit of data-driven science: stability, convergence, and confidence are not sufficient indicators of epistemic validity. We argue that inference cannot be treated as an unconditional consequence of data availability, but must instead be governed by explicit constraints on the integrity of the observational process.
翻译:现代科学日益依赖不断增长的观测数据集和自动化推理流程,其隐含信念是:积累更多数据会使科学结论更可靠。本文揭示这一信念可能以根本且不可逆的方式失效。我们识别出一种结构性机制,其中标准推理程序能平滑收敛、保持良好校准并通过常规诊断检验,却系统性地收敛至错误结论。这种失败发生在观测可靠性以推理过程本身无法观测的方式退化时。通过最小化合成实验,我们证明在此机制下,额外数据不仅无法修正错误,反而会放大误差,而基于残差和拟合优度的诊断指标仍保持误导性的正常状态。这些结果揭示了数据驱动科学的内在局限:稳定性、收敛性和置信度并非认知有效性的充分指标。我们认为,推理不能被视为数据可用性的无条件结果,而必须受到对观测过程完整性的显式约束所支配。