Recent advancements in large language models (LLMs) have significantly expanded their functionality and skills as tool agents. In this paper, we argue that a waveform pattern in the model's attention allocation has an impact on the tool use performance, which degrades when the position of essential information hits the trough zone. To address this issue, we propose a novel inference method named Attention Buckets. This approach enables LLMs to handle context by conducting parallel processes, each featuring a unique RoPE angle base that shapes the attention waveform. Attention Buckets ensures that an attention trough of a particular process can be compensated with an attention peak of another run, reducing the risk of the LLM missing essential information residing within the attention trough. Our extensive experiments on the widely recognized tool use benchmark demonstrate the efficacy of our approach, where a 7B-parameter open-source model enhanced by Attention Buckets achieves SOTA performance on par with GPT-4.
翻译:近期大语言模型的进展显著扩展了其作为工具代理的功能与技能。本文论证了模型注意力分配中的波形模式对工具使用性能的影响——当关键信息位置落入注意力谷底区域时性能会下降。针对此问题,我们提出名为Attention Buckets的新型推理方法。该方法通过并行处理流程引导大语言模型处理上下文,每个流程采用独特的RoPE角度基底以塑造注意力波形。Attention Buckets确保某一流程的注意力谷底能被另一流程的注意力峰值补偿,从而降低模型遗漏位于注意力谷底中关键信息的风险。在广泛认可的工具使用基准上的大量实验验证了本方法的有效性:采用Attention Buckets增强的开源7B参数模型,其性能达到与GPT-4相当的最佳水平。