Recent advances in Large Language Models (LLMs) have demonstrated the emergence of capabilities (learned skills) when the number of system parameters and the size of training data surpass certain thresholds. The exact mechanisms behind such phenomena are not fully understood and remain a topic of active research. Inspired by the skill-text bipartite graph model proposed by Arora and Goyal for modeling semantic languages, we develop a mathematical theory to explain the emergence of learned skills, taking the learning (or training) process into account. Our approach models the learning process for skills in the skill-text bipartite graph as an iterative decoding process in Low-Density Parity Check (LDPC) codes and Irregular Repetition Slotted ALOHA (IRSA). Using density evolution analysis, we demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold. Our analysis also yields a scaling law for testing errors relative to this ratio. Upon completion of the training, the association of learned skills can also be acquired to form a skill association graph. We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph. Our analysis can also be extended to the setting with a hierarchy of skills, where a fine-tuned model is built upon a foundation model. It is also applicable to the setting with multiple classes of skills and texts. As an important application, we propose a method for semantic compression and discuss its connections to semantic communication.
翻译:近期大型语言模型(LLMs)的进展表明,当系统参数数量和训练数据规模超过特定阈值时,会出现能力涌现现象(即习得技能)。此类现象背后的确切机制尚未完全明晰,仍是当前研究的热点议题。受Arora与Goyal提出的技能-文本二分图模型(用于语义语言建模)启发,我们通过纳入学习(或训练)过程,构建了一套数学理论以解释习得技能的涌现机制。本文方法将技能-文本二分图中的技能学习过程建模为低密度奇偶校验(LDPC)码与非规则重复时隙ALOHA(IRSA)中的迭代解码过程。基于密度进化分析,我们证明当训练文本数量与技能数量的比值超过特定阈值时,技能涌现现象会发生。分析还揭示了测试误差相对该比值的标度律。训练完成后,习得技能之间的关联可被进一步获取,形成技能关联图。我们利用位渗流分析推导了技能关联图中存在巨连通分量的条件。该分析可扩展至层级化技能场景(如在基础模型上构建微调模型),亦适用于多类技能与文本的联合建模。作为重要应用实例,我们提出一种语义压缩方法,并探讨其与语义通信的内在联系。