Recent large language models (LLMs) have revealed strong abilities to understand natural language. Since most of them share the same basic structure, i.e. the transformer block, possible contributors to their success in the training process are scaling and instruction tuning. However, how these factors affect the models' language perception is unclear. This work compares the self-attention of several existing LLMs (LLaMA, Alpaca and Vicuna) in different sizes (7B, 13B, 30B, 65B), together with eye saccade, an aspect of human reading attention, to assess the effect of scaling and instruction tuning on language perception. Results show that scaling enhances the human resemblance and improves the effective attention by reducing the trivial pattern reliance, while instruction tuning does not. However, instruction tuning significantly enhances the models' sensitivity to instructions. We also find that current LLMs are consistently closer to non-native than native speakers in attention, suggesting a sub-optimal language perception of all models. Our code and data used in the analysis is available on GitHub.
翻译:近期的大规模语言模型(LLMs)展现出了强大的自然语言理解能力。由于这些模型大多共享相同的基本结构,即Transformer模块,其成功训练过程中可能的关键因素包括规模扩展与指令微调。然而,这些因素如何影响模型的语言感知尚不明确。本研究通过对比不同参数规模(7B、13B、30B、65B)的现有LLMs(LLaMA、Alpaca、Vicuna)的自注意力机制,并结合人类阅读注意力中的眼跳特征,评估规模扩展与指令微调对语言感知的影响。结果表明:规模扩展能提升模型注意力与人类注意力的相似度,并通过减少对琐碎模式的依赖改善有效注意力,而指令微调则不具备此效果。但指令微调显著增强了模型对指令的敏感性。我们还发现,当前所有LLMs在注意力特征上均更接近非母语者而非母语者,这表明模型的语言感知尚未达到最优水平。分析所用的代码与数据已发布于GitHub。