Reasoning LLMs trained with long chain-of-thought often overthink: they spend tokens on redundant reflection and transitions that inflate cost without improving accuracy. Static activation steering (e.g.\ SEAL) suppresses such content with a fixed vector, but applies the same strength regardless of how redundant the current chunk actually is. We describe PID-steering, a training-free, decoding-time method that modulates the steering strength with a PID controller driven by a lightweight chunk-level redundancy classifier. On a subset of GSM8K with DeepSeek-R1-Distill-Qwen-1.5B, the method improves accuracy from 85.7\% to 89.6\% (+3.9 pp) while cutting average output length from 1026 to 790 tokens ($-$23\%). We report it as a small-scale proof of concept rather than a benchmark result.
翻译:采用长思维链训练的推理大语言模型常存在“过度思考”现象:模型在冗余反思及过渡阶段消耗大量标记,导致成本增加却未能提升准确率。静态激活引导方法(如SEAL)虽能通过固定向量抑制此类内容,但其施加强度与当前块的实际冗余程度无关。本文提出PID-steering方法——一种无需训练、解码阶段的轻量级方法,通过基于块级冗余分类器的PID控制器动态调节引导强度。在DeepSeek-R1-Distill-Qwen-1.5B模型的GSM8K子集测试中,该方法将准确率从85.7%提升至89.6%(+3.9个百分点),同时将平均输出长度从1026个标记缩减至790个标记(-23%)。本研究作为小规模概念验证报告,而非基准测试结果。