ENIGMA: The Geometry of Reasoning and Alignment in Large-Language Models

We present Entropic Mutual-Information Geometry Large-Language Model Alignment (ENIGMA), a novel approach to Large-Language Model (LLM) training that jointly improves reasoning, alignment and robustness by treating an organisation's policies/principles as directions to move on a model's information manifold. Our single-loop trainer combines Group-Relative Policy Optimisation (GRPO), an on-policy, critic-free RL method with Chain-of-Thought (CoT)-format only rewards; a Self-Supervised Alignment with Mutual Information (SAMI)-style symmetric InfoNCE auxiliary; and an entropic Sinkhorn optimal-transport regulariser on hidden-state distributions to bound geometry drift. We also introduce infoNCE metrics that specialise to a standard MI lower bound under matched negatives to measure how strongly a model's CoT encodes these policies. These metrics include a Sufficiency Index (SI) that enables the selection and creation of principles that maximise downstream performance prior to training. In our experiments using small (1B) LLMs, high-SI principles predict steadier training dynamics and improved benchmark performance over GRPO ablations. Our information-geometry analysis of trained models validates desirable structural change in the manifold. These results support our hypothesis that reasoning, alignment, and robustness are projections of a single informationgeometric objective, and that models trained using ENIGMA demonstrate principled reasoning without the use of a reward model, offering a path to trusted capability

翻译：我们提出了熵互信息几何大语言模型对齐（ENIGMA），这是一种新颖的大语言模型（LLM）训练方法，通过将组织的政策/原则视为在模型信息流形上移动的方向，联合提升模型的推理能力、对齐性和鲁棒性。我们的单循环训练器结合了以下组件：组相对策略优化（GRPO）——一种仅使用思维链（CoT）格式奖励的在线、无评论者强化学习方法；基于互信息的自监督对齐（SAMI）式对称InfoNCE辅助目标；以及对隐藏状态分布施加的熵Sinkhorn最优传输正则化项，以约束几何漂移。我们还引入了一系列InfoNCE度量，这些度量在匹配负样本下特化为标准互信息下界，用于衡量模型的CoT对这些原则的编码强度。这些度量包括充分性指数（SI），它使得我们能够在训练前选择和创建能最大化下游性能的原则。在使用小型（10亿参数）LLM的实验中，高SI原则相较于GRPO消融实验，预测了更稳定的训练动态和提升的基准测试性能。我们对已训练模型进行的信息几何分析验证了流形上理想的结构变化。这些结果支持了我们的假设：推理、对齐和鲁棒性是单一信息几何目标的不同投影，并且使用ENIGMA训练的模型无需使用奖励模型即可展现出原则性推理，为通往可信能力提供了一条路径。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日