We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from information during in-context learning or instruction-tuning through exploiting the complex knowledge structure within mathematics. Motivated by the Neural Tangent Kernel (NTK), we propose \textit{NTKEval} to assess changes in LLM's probability distribution via training on different kinds of math data. Our systematic analysis finds evidence of domain understanding during in-context learning. By contrast, certain instruction-tuning leads to similar performance changes irrespective of training on different data, suggesting a lack of domain understanding across different skills.
翻译:我们开始看到语言模型在辅助科学发现方面取得进展。受将大语言模型作为通用科学助手的启发,本文通过评估大语言模型对解决问题所需不同数学技能的理解来检验其领域知识。具体而言,我们不仅关注预训练模型已掌握的知识,更探究其如何通过利用数学内部复杂的知识结构,在上下文学习或指令微调过程中学会从信息中学习。受神经正切核(NTK)启发,我们提出 \textit{NTKEval} 方法,通过在不同类型的数学数据上进行训练来评估大语言模型概率分布的变化。我们的系统分析发现了上下文学习过程中领域理解的证据。相比之下,某些指令微调方法在不同数据训练下产生了相似的性能变化,这表明模型未能形成跨不同数学技能的领域理解。