We present Entropic Mutual-Information Geometry Large-Language Model Alignment (ENIGMA), a novel approach to Large-Language Model (LLM) training that jointly improves reasoning, alignment and robustness by treating an organisation's policies/principles as directions to move on a model's information manifold. Our single-loop trainer combines Group-Relative Policy Optimisation (GRPO), an on-policy, critic-free RL method with Chain-of-Thought (CoT)-format only rewards; a Self-Supervised Alignment with Mutual Information (SAMI)-style symmetric InfoNCE auxiliary; and an entropic Sinkhorn optimal-transport regulariser on hidden-state distributions to bound geometry drift. We also introduce infoNCE metrics that specialise to a standard MI lower bound under matched negatives to measure how strongly a model's CoT encodes these policies. These metrics include a Sufficiency Index (SI) that enables the selection and creation of principles that maximise downstream performance prior to training. In our experiments using small (1B) LLMs, high-SI principles predict steadier training dynamics and improved benchmark performance over GRPO ablations. Our information-geometry analysis of trained models validates desirable structural change in the manifold. These results support our hypothesis that reasoning, alignment, and robustness are projections of a single informationgeometric objective, and that models trained using ENIGMA demonstrate principled reasoning without the use of a reward model, offering a path to trusted capability
翻译:我们提出了熵互信息几何大语言模型对齐(ENIGMA),这是一种新颖的大语言模型(LLM)训练方法,通过将组织的政策/原则视为在模型信息流形上移动的方向,联合提升模型的推理能力、对齐性和鲁棒性。我们的单循环训练器结合了以下组件:组相对策略优化(GRPO)——一种仅使用思维链(CoT)格式奖励的在线、无评论者强化学习方法;基于互信息的自监督对齐(SAMI)式对称InfoNCE辅助目标;以及对隐藏状态分布施加的熵Sinkhorn最优传输正则化项,以约束几何漂移。我们还引入了一系列InfoNCE度量,这些度量在匹配负样本下特化为标准互信息下界,用于衡量模型的CoT对这些原则的编码强度。这些度量包括充分性指数(SI),它使得我们能够在训练前选择和创建能最大化下游性能的原则。在使用小型(10亿参数)LLM的实验中,高SI原则相较于GRPO消融实验,预测了更稳定的训练动态和提升的基准测试性能。我们对已训练模型进行的信息几何分析验证了流形上理想的结构变化。这些结果支持了我们的假设:推理、对齐和鲁棒性是单一信息几何目标的不同投影,并且使用ENIGMA训练的模型无需使用奖励模型即可展现出原则性推理,为通往可信能力提供了一条路径。