We introduce adaptive learn-then-test (aLTT), an efficient hyperparameter selection procedure that provides finite-sample statistical guarantees on the population risk of AI models. Unlike the existing learn-then-test (LTT) technique, which relies on conventional p-value-based multiple hypothesis testing (MHT), aLTT implements sequential data-dependent MHT with early termination by leveraging e-processes. As a result, aLTT can reduce the number of testing rounds, making it particularly well-suited for scenarios in which testing is costly or presents safety risks. Apart from maintaining statistical validity, in applications such as online policy selection for offline reinforcement learning and prompt engineering, aLTT is shown to achieve the same performance as LTT while requiring only a fraction of the testing rounds.
翻译:本文提出自适应学习-测试(aLTT)方法,这是一种高效的超参数选择流程,可为人工智能模型的总体风险提供有限样本统计保证。与现有依赖传统p值多重假设检验(MHT)的学习-测试(LTT)技术不同,aLTT通过利用e过程实现具有提前终止机制的序列化数据依赖型MHT。因此,aLTT能够减少测试轮次,特别适用于测试成本高昂或存在安全风险的场景。在保持统计有效性的同时,在离线强化学习的在线策略选择、提示工程等应用中,aLTT被证明仅需少量测试轮次即可达到与LTT相同的性能表现。