This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF. Statistical analyses confirm a significant correlation between the two capabilities, suggesting that cognitive capacity may limit expressive potential. The paper also explores the theoretical underpinnings of these divergent developmental trajectories and their connection to the LLMs' architectural design. Moreover, we evaluate various optimization-independent strategies, such as few-shot learning and repeated sampling, which bridge the gap between cognitive and expressive capabilities. This research reveals the potential connection between the hidden space and the output space, contributing valuable insights into the interpretability and controllability of their training processes.
翻译:本文深入研究了大型语言模型(LLMs)认知与表达能力的发展演变及相互作用,重点关注先进的双语(中英文)LLM系列——Baichuan-7B与Baichuan-33B。我们通过线性表征在三个关键阶段——预训练、监督微调(SFT)和基于人类反馈的强化学习(RLHF)——定义并探索了模型的认知与表达能力。认知能力定义为网络内部神经元输出向量所传递信息的数量与质量,类似于人类认知中的神经信号处理过程。表达能力则定义为模型产生词级输出的能力。我们的研究揭示了一种顺序发展模式:认知能力主要在预训练阶段建立,而表达能力则在SFT和RLHF阶段显著提升。统计分析证实两种能力之间存在显著相关性,表明认知容量可能限制表达潜力。本文还探讨了这些不同发展轨迹的理论基础及其与LLMs架构设计的关联。此外,我们评估了多种独立于优化的策略(如少样本学习和重复采样),这些策略能够弥合认知与表达能力之间的差距。本研究揭示了隐空间与输出空间之间的潜在联系,为理解其训练过程的可解释性与可控性提供了重要见解。