Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process. This paper further elucidates the interplay between the linear world representation and causal decision-making, and their dependence on layer depth and model complexity. We have made the code public.
翻译:基础模型在决策制定与逻辑推理方面展现出显著能力。然而,关于其是否真正理解世界抑或仅是随机模仿的争论仍在持续。本文深入剖析了一个专为奥赛罗游戏训练的简易Transformer,在先前研究基础上进一步阐释奥赛罗-GPT涌现的世界模型。研究发现,奥赛罗-GPT编码了棋子对垒的线性表征,这一表征对决策过程具有因果导向作用。本文进一步阐明了线性世界表征与因果决策之间的交互机制,及其对网络层深度和模型复杂度的依赖性。相关代码已公开。