We introduce Sequential-EDFL (Empirical Dynamic Formal Lift), which applies anytime-valid sequential testing to language model generation stopping. Our approach tracks information lift, defined as the log-likelihood ratio between the full model and deliberately weakened "skeleton" baselines, using self-normalized empirical-Bernstein e-processes that provide formal delta-level error control regardless of stopping time. This delta guarantee controls premature stopping when information lift is insufficient relative to the skeleton, and it does not imply delta control of factual incorrectness or hallucinations. We handle unknown centering through online mean estimation, combine multiple parameters via mixture e-processes, and support adaptive resets under distributional drift. On six benchmarks, Sequential-EDFL reduces generation length by 22 to 28 percent relative to sequential baselines while maintaining delta-level control with 12 percent computational overhead. We introduce automated skeletons (distilled submodels and randomized logits) and show robustness across skeleton families. Composing EDFL with a lightweight correctness gate (sentence boundaries plus a verifier) improves end-task correctness while preserving anytime-valid guarantees by only delaying stopping. Our certificates control information sufficiency, not factual correctness. Specifically, 10.9 percent of stopped sequences remain incorrect even with the gate (13.2 to 22.7 percent without it). EDFL serves as a first-stage filter that can reduce verification burden: when applied to stopped sequences, the gate validates 83 percent of stops, requiring full verification only for the remaining 17 percent, plus all non-stopped sequences. EDFL is not a standalone solution for safety-critical domains.
翻译:本文提出序列化经验动态形式提升方法,该技术将随时有效序列检验应用于语言模型生成停止机制。我们的方法通过自归一化经验伯努利e过程追踪信息提升量——即完整模型与刻意弱化的"骨架"基线之间的对数似然比,该方法可在任意停止时间下提供形式化的δ水平误差控制。该δ保证在信息提升量相对于骨架不足时控制过早停止,但并不意味着对事实错误或幻觉的δ控制。我们通过在线均值估计处理未知中心化问题,利用混合e过程整合多参数,并支持分布漂移下的自适应重置。在六个基准测试中,Sequential-EDFL相较于序列基线将生成长度减少22%至28%,同时以12%的计算开销维持δ水平控制。我们提出自动化骨架构建方法,并证明其在不同骨架族间的鲁棒性。将EDFL与轻量级正确性门控机制组合,可在仅延迟停止时间的前提下提升终端任务正确率,同时保持随时有效性保证。需要强调的是,我们的认证控制信息充分性而非事实正确性:即使采用门控机制,仍有10.9%的停止序列存在错误。EDFL可作为降低验证负担的一级过滤器:当应用于停止序列时,门控机制能验证83%的停止决策,仅需对剩余17%的停止序列及所有未停止序列进行完整验证。需要特别指出,EDFL并非安全关键领域的独立解决方案。