Large language models (LLMs) have catalyzed a paradigm shift in natural language processing, yet their limited controllability poses a significant challenge for downstream applications. We aim to address this by drawing inspiration from the neural mechanisms of the human brain, specifically Broca's and Wernicke's areas, which are crucial for language generation and comprehension, respectively. In particular, Broca's area receives cognitive decision signals from Wernicke's area, treating the language generation as an intricate decision-making process, which differs from the fully auto-regressive language generation of existing LLMs. In a similar vein, our proposed system, the BWArea model, conceptualizes language generation as a decision-making task. This model has three components: a language world model, an inverse dynamics model, and a cognitive policy. Like Wernicke's area, the inverse dynamics model is designed to deduce the underlying cognitive intentions, or latent actions, behind each token. The BWArea model is amenable to both pre-training and fine-tuning like existing LLMs. With 30B clean pre-training tokens, we have trained a BWArea model, which achieves competitive performance with LLMs of equal size (1B parameters). Unlike fully auto-regressive LLMs, its pre-training performance does not degenerate if dirty data unintentionally appears. This shows the advantage of a decomposed structure of BWArea model in reducing efforts in laborious data selection and labeling. Finally, we reveal that the BWArea model offers enhanced controllability via fine-tuning the cognitive policy with downstream reward metrics, thereby facilitating alignment with greater simplicity. On 9 out of 10 tasks from two suites, TextWorld and BigBench Hard, our method shows superior performance to auto-regressive LLMs.
翻译:大型语言模型(LLMs)已引发自然语言处理领域的范式转变,但其有限的可控性为下游应用带来了重大挑战。为解决这一问题,我们受人类大脑神经机制的启发,特别是布洛卡区与韦尼克区——它们分别对语言生成与理解至关重要。具体而言,布洛卡区接收来自韦尼克区的认知决策信号,将语言生成视为复杂的决策过程,这与现有LLMs完全自回归的语言生成方式不同。类似地,我们提出的系统——BWArea模型——将语言生成概念化为决策任务。该模型包含三个组件:语言世界模型、逆动力学模型与认知策略。与韦尼克区类似,逆动力学模型旨在推断每个词元背后潜在的认知意图(即隐式动作)。BWArea模型与现有LLMs同样适用于预训练与微调。基于300亿个洁净预训练词元,我们训练了一个BWArea模型,其性能与同等规模(10亿参数)的LLMs相当。与完全自回归的LLMs不同,即使无意中出现脏数据,其预训练性能也不会退化。这体现了BWArea模型分解式结构在减少繁琐数据选择与标注工作方面的优势。最后,我们证明BWArea模型可通过基于下游奖励指标微调认知策略来增强可控性,从而以更简化的方式实现对齐。在TextWorld和BigBench Hard两个测试集的共10项任务中,我们的方法在9项任务上表现出优于自回归LLMs的性能。