Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called ``Mixture of In-Context Experts'' (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy: (1) MoICE views each RoPE angle as an `in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly processes tokens using multiple RoPE angles dynamically selected by the router to attend to the needed positions. This approach mitigates the risk of overlooking essential contextual information. (2) The router-only training strategy entails freezing LLM parameters and exclusively updating routers for only a few steps. When applied to open-source LLMs including Llama and Mistral, MoICE surpasses prior methods across multiple tasks on long context understanding and generation, all while maintaining commendable inference efficiency.
翻译:多项研究表明,大语言模型(LLMs)对不同上下文位置的感知能力存在不均衡性。其有限的上下文感知能力可能导致关键信息被忽略,进而引发任务失败。尽管已有多种方法被提出以增强LLMs的上下文感知能力,但实现效果与效率的兼顾仍具挑战性。本文针对使用RoPE作为位置嵌入的LLMs,提出一种名为“混合上下文专家”(MoICE)的新方法以应对这一挑战。MoICE包含两个关键组成部分:集成于LLMs每个注意力头中的路由器,以及一种轻量级的仅路由器训练优化策略:(1)MoICE将每个RoPE角度视为一个“上下文”专家,其被证明能够引导注意力头关注特定的上下文位置。因此,每个注意力头通过路由器动态选择的多个RoPE角度灵活处理词元,以关注所需位置。这种方法降低了忽略关键上下文信息的风险。(2)仅路由器训练策略需要冻结LLM参数,并仅对路由器进行少量步数的更新。当应用于包括Llama和Mistral在内的开源LLMs时,MoICE在多项长上下文理解与生成任务上超越了先前方法,同时保持了可观的推理效率。