DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning

Test-time adaptation offers a promising avenue for improving reasoning performance in large language models without additional supervision, but existing approaches often apply a uniform optimization objective across all inputs, leading to inefficient or unstable adaptation on heterogeneous reasoning problems. We propose DiSCTT, a difficulty-aware, consensus-guided self-curriculum framework that dynamically allocates test-time optimization strategies based on instance-level epistemic uncertainty estimated from agreement among sampled reasoning trajectories. Inputs with high consensus are consolidated via supervised fine-tuning using majority-agreed solutions as pseudo-labels, while low-consensus inputs are optimized via reinforcement learning with a consensus-regularized objective that encourages diversity under relevance constraints. Across a broad suite of mathematical and general reasoning benchmarks, DiSCTT consistently outperforms strong test-time adaptation baselines, achieving higher accuracy with reduced variance and substantially lower computation and wall-clock training times. These results demonstrate that explicitly accounting for instance difficulty and uncertainty enables more stable, efficient, and effective test-time adaptation for reasoning models.

翻译：测试时适应为提升大型语言模型的推理性能提供了一条无需额外监督的可行路径，但现有方法通常对所有输入采用统一的优化目标，导致在异构推理问题上存在适应效率低下或不稳定的问题。本文提出DiSCTT，一种基于难度感知与共识引导的自课程框架，该框架根据从采样推理轨迹一致性中估计的实例级认知不确定性，动态分配测试时优化策略。对于高共识输入，我们采用以多数一致解作为伪标签的监督微调进行巩固；对于低共识输入，则通过强化学习进行优化，其目标函数经过共识正则化处理，以在相关性约束下鼓励多样性。在广泛的数学与通用推理基准测试中，DiSCTT持续优于现有强测试时适应基线，以更低的计算成本和实际训练时间实现了更高的准确率与更低的方差。这些结果表明，显式考虑实例难度与不确定性能够为推理模型带来更稳定、高效且有效的测试时适应。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

【AAAI2026】NeSTR：一种用于大型语言模型的神经-符号可溯因框架，用于时间推理

专知会员服务

17+阅读 · 2025年12月10日

重新审视测试时扩展：一项综述与面向多样性的高效推理方法

专知会员服务

10+阅读 · 2025年6月8日

【CVPR2024】GroupContrast：语义感知的自监督表示学习用于三维理解

专知会员服务

18+阅读 · 2024年3月15日

因果关联学习，Causal Relational Learning

专知会员服务

185+阅读 · 2020年4月21日