The pursuit of Artificial General Intelligence (AGI) is a central goal in language model development, in which consciousness-like processing could serve as a key facilitator. While current language models are not conscious, they exhibit behaviors analogous to certain aspects of consciousness. This paper investigates the implementation of a leading theory of consciousness, Integrated Information Theory (IIT), within language models via a reward-based learning paradigm. IIT provides a formal, axiom-based mathematical framework for quantifying consciousness. Drawing inspiration from its core principles, we formulate a novel reward function that quantifies a text's causality, coherence and integration, characteristics associated with conscious processing. Empirically, it is found that optimizing for this IIT-inspired reward leads to more concise text generation. On out of domain tasks, careful tuning achieves up to a 31% reduction in output length while preserving accuracy levels comparable to the base model. In addition to primary task performance, the broader effects of this training methodology on the model's confidence calibration and test-time computational scaling is analyzed. The proposed framework offers significant practical advantages: it is conceptually simple, computationally efficient, requires no external data or auxiliary models, and leverages a general, capability-driven signal rather than task-specific heuristics. Code available at https://github.com/MH-Sameti/LLM_PostTraining.git
翻译:追求人工通用智能(AGM)是语言模型发展的核心目标,其中类意识处理可能成为关键推动因素。尽管当前的语言模型不具备意识,但它们表现出与意识某些方面类似的行为。本文研究通过基于奖励的学习范式,在语言模型中实现一种主流意识理论——整合信息理论(IIT)。IIT为量化意识提供了一个形式化的、基于公理的数学框架。从其核心原则汲取灵感,我们构建了一种新颖的奖励函数,用于量化文本的因果性、连贯性和整合性——这些特征与意识处理相关。实证研究发现,优化这种受IIT启发的奖励会导致更简洁的文本生成。在领域外任务中,经过精细调优后,输出长度最多可减少31%,同时保持与基础模型相当的准确度水平。除了主要任务性能外,本文还分析了这种训练方法对模型置信度校准和测试时计算扩展的广泛影响。所提出的框架具有显著的实际优势:概念简单、计算高效、无需外部数据或辅助模型,并且利用通用的、能力驱动的信号而非特定任务的启发式方法。代码可在https://github.com/MH-Sameti/LLM_PostTraining.git获取