TASR: Training-Free Adaptive Stopping for Iterative Retrieval

Iterative retrieval-augmented generation agents commonly overspend by continuing to retrieve after the model has converged on an answer, incurring calls that change neither the prediction nor the supporting evidence. Existing remedies learn a stopping policy from labeled trajectories, tying the decision to a trained component that requires retraining for each new model or task. We propose TASR (Training-Free Adaptive Stopping Rule), a one-line predicate that fires when the model repeats its previous-round normalized answer and the isotonically calibrated logit margin exceeds 0.25. No classifier or value head is learned; the threshold is fixed across all twenty-four (model, retriever, corpus) configurations we evaluate. On a 3-model x 2-dataset distractor grid, TASR retains 94.8% of fixed-k=5's macro F1 at 62.6% of its calls and exceeds fixed-k=3 by +3.42 F1. The pattern holds on nine open-domain BM25 cells (55.01 F1 at 2.98 calls vs. 54.33 at 3.00 for fixed-k=3) and, with calibration locked from the distractor split, on nine dense-retrieval cells across two retriever families, with zero significant regressions in either extension. The rule was selected from an exhaustive enumeration of 381 candidate stopping rules; no alternative Pareto-dominates it on any evaluated configuration. A signal-quality analysis shows that verbalized 1-5 confidence collapses on RLHF-tuned models (96.5% of values equal 5, entropy 0.182 nats), while the logit margin achieves 44x better class-conditional separation, grounding the design in a measurable model pathology. TASR is an auditable, training-free Pareto baseline against which learned stopping controllers can be compared. Code is publicly available.

翻译：迭代式检索增强生成智能体通常在模型已收敛到答案后仍继续检索，导致过多开销——这些调用既未改变预测结果，也未改变支撑证据。现有方法通过学习标注轨迹中的停止策略，将决策绑定到需要针对每个新模型或任务重新训练的可训练组件上。我们提出了TASR（无训练自适应停止规则），这是一种单行谓词，当模型重复上一轮归一化答案且经等渗校准后的logit边际超过0.25时触发。无需学习分类器或价值头；该阈值在我们评估的全部二十四组（模型、检索器、语料库）配置中保持固定。在3模型×2数据集的干扰项网格上，TASR以62.6%的调用量保留了固定k=5策略94.8%的宏F1值，并比固定k=3策略高出3.42个F1点。该模式在九个开放域BM25实验单元中成立（2.98次调用下55.01 F1 vs. 固定k=3策略3.00次调用下54.33 F1），并且在锁定干扰项分组的校准参数后，跨两个检索器家族的九个密集检索实验单元中，两种扩展场景均未出现显著性能衰退。该规则从381种候选停止规则的穷举枚举中选出；没有任何替代规则能在任何评估配置上对其实现帕累托占优。信号质量分析表明，RLHF微调模型上的口头化1-5置信评分出现崩塌（96.5%的值为5，熵值0.182纳特），而logit边际实现了44倍更好的类别条件分离，从而将设计建立可测量的模型病态特征之上。TASR为可审计、无需训练的帕累托基线，可据此对比学习型停止控制器。代码已公开。