Large Language Models (LLMs) have shown remarkable capabilities, but their reasoning abilities and underlying mechanisms remain poorly understood. We present a novel approach to enhance LLMs' reasoning through attention mechanism optimization, without additional training data. We identify inefficiencies in the attention distribution caused by non-semantic tokens and propose an algorithm to re-balance the skewed distribution, enabling the model to abstract more nuanced knowledge. Our experiments demonstrate significantly improved reasoning capabilities, particularly for non-STEM questions. We provide insights into the role of attention patterns in LLMs' reasoning and propose a method to enhance these abilities, paving the way for more powerful and versatile language models.
翻译:大型语言模型(LLMs)展现了卓越的能力,但其推理能力及背后机制仍未被充分理解。我们提出了一种通过注意力机制优化来增强LLMs推理能力的新方法,且无需额外训练数据。我们识别出非语义标记导致的注意力分布低效问题,并提出一种算法来重新平衡偏斜的分布,使模型能够抽象出更细微的知识。实验表明,该方法显著提升了推理能力,尤其在非STEM问题中表现突出。我们深入分析了注意力模式在LLMs推理中的作用,并提出增强这些能力的方法,为构建更强大、更通用的语言模型铺平了道路。