Contrastive Decoding (CD) enhances the generation quality of large language models (LLMs) but incurs significant additional computational overhead due to the need for an auxiliary model. Existing internal self-contrastive decoding methods, such as Decoding by Contrasting Layers (DoLa), focus on discrepancies across different layers, which are notably unstable on small-scale models. In this work, based on the observation that LLMs exhibit local preferences, we propose a novel contrastive guidance strategy along the temporal dimension, namely Temporal Guidance (TeGu). Our method ingeniously leverages Multi-Token Prediction (MTP) to construct weaker amateur predictions for model self-contrast. To standardize the implementation of this mechanism, we further introduce a lightweight Conditional MTP Projector (cMTPP), which avoids maintaining multiple independent networks as required by other MTP modules. Across various model series and benchmarks, TeGu achieves significant performance improvements while maintaining low additional memory consumption and computational overhead.
翻译:对比解码(CD)能够提升大型语言模型(LLM)的生成质量,但由于需要辅助模型而带来显著的计算开销。现有的内部自对比解码方法(如基于层间对比的解码方法DoLa)主要关注不同网络层之间的差异,这在小型模型上表现出明显的不稳定性。本研究基于LLM存在局部偏好的观察,提出了一种沿时间维度进行对比引导的新策略——时序引导(TeGu)。该方法巧妙地利用多词元预测(MTP)机制构建较弱的“业余”预测结果,以实现模型的自对比。为规范该机制的实现,我们进一步引入轻量级的条件多词元预测投影器(cMTPP),避免了其他MTP模块所需维护多个独立网络的问题。在多种模型系列和基准测试中,TeGu在保持较低额外内存消耗和计算开销的同时,实现了显著的性能提升。