A prior-free blind detection of information leakage from model predictions

Data leakage -- contamination of a model with information unavailable at baseline -- is the dominant reproducibility failure in machine-learning-based science, yet detection tools require training code, external data, or domain expertise. None operates on the artifact an auditor most often holds: the model's output. We ask what can be decided about leakage from predictions and outcomes alone. We give a decision-theoretic framework in which leakage diagnostics are functionals of the predicted-risk/outcome law, parameterized by a threshold-weighting linked to proper scoring rules and decision-curve analysis. We prove a sharp impossibility: a recalibrated leak matching an honest model's calibration and discrimination is indistinguishable from honest performance by \emph{any} function of the predictions, so the broad class is detectable only against an externally supplied ceiling on achievable discrimination. We then prove what leakage cannot hide: a near-deterministic subgroup -- the signature of a near-label leak -- produces a sustained unit-purity head that no legitimate predictor of a non-deterministic outcome can manufacture, yielding a prior-free test. These results organize leakage into a trichotomy -- miscalibrated, broad-calibrated, and deterministic -- each with a matched detector and failure mode. We validate on UK Biobank using time-windowed comorbidity leakage with known, graded severity, measuring a detection floor of $Δ\cstar \approx 0.007$ on this endpoint, below which residual leakage is undetectable from output and too small to alter conclusions. The numerical floor is cohort- and endpoint-specific; the structural lesson is general: output-only detection fails where residual leakage is indistinguishable from an honestly stronger predictor. The test returns a verdict on a prediction vector in under a second on commodity hardware.

翻译：数据泄露——即模型被基线信息不可获取的数据污染——是基于机器学习的科学方法中导致可复现性失败的主要原因，然而现有检测工具需要训练代码、外部数据或领域专业知识。这些工具均无法处理审计员最常掌握的工件：模型输出。本文探究仅基于预测结果和实际结果能判定何种信息泄露。我们提出一个决策理论框架，其中泄露诊断指标是预测风险/结果分布的泛函，通过连接恰当评分规则和决策曲线分析的阈值加权参数化。我们证明了一个严格的不可能性：经过重校准的泄露模型若与诚实模型的校准度和判别能力匹配，则任何基于预测的函数均无法区分其与诚实性能，因此该广泛类别的泄露仅能在外部提供的可达判别能力上限下被检测。随后我们证明泄露无法隐藏之处：近确定性子群——即近标签泄露的特征——会产生持续的单位纯度头部，这是任何非确定性结果的合法预测器无法制造的，从而产生无需先验知识的检验。这些结果将泄露组织为三重分类——未校准泄露、广泛校准泄露和确定性泄露——每种类型对应匹配的检测器及失效模式。我们利用已知分级严重性的时窗共病泄露在英国生物银行数据集上进行验证，在该终点上测量到检测下限Δcstar ≈ 0.007，低于此值的残余泄露无法从输出中检测且影响过小不足以改变结论。该数值下限因队列和终点而异；结构性的经验教训具有普适性：当残余泄露与诚实更强预测器不可区分时，仅基于输出的检测将失效。本检验可在商用硬件上于1秒内对预测向量给出判定结果。