Large Language Models (LLMs) often exhibit a gap between their internal knowledge and their explicit linguistic outputs. In this report, we empirically investigate whether Looped Transformers (LTs)--architectures that increase computational depth by iterating shared layers--can bridge this gap by utilizing their iterative nature as a form of introspection. Our experiments reveal that while increasing loop iterations narrows the gap, it is partly driven by a degradation of their internal knowledge carried by representations. Moreover, another empirical analysis suggests that current LTs' ability to perceive representations does not improve across loops; it is only present in the final loop. These results suggest that while LTs offer a promising direction for scaling computational depth, they have yet to achieve the introspection required to truly link representation space and natural language.
翻译:大型语言模型(LLMs)常在其内部知识与显式语言输出之间存在差距。本报告通过实证研究探讨了循环Transformer(LTs)——一种通过迭代共享层来增加计算深度的架构——能否利用其迭代特性作为自省机制来弥合这一差距。实验表明,虽然增加循环迭代次数能缩小差距,但这部分归因于表征所承载的内部知识发生了退化。此外,另一项实证分析表明,当前LTs对表征的感知能力并未随循环迭代而提升;该能力仅存在于最终循环中。这些结果表明,尽管LTs为扩展计算深度提供了有前景的方向,但它们尚未实现真正连接表征空间与自然语言所需的自省能力。