In order for language models to aid physics research, they must first encode representations of mathematical and natural language discourse which lead to coherent explanations, with correct ordering and relevance of statements. We present a collection of datasets developed to evaluate the performance of language models in this regard, which measure capabilities with respect to sentence ordering, position, section prediction, and discourse coherence. Analysis of the data reveals equations and sub-disciplines which are most common in physics discourse, as well as the sentence-level frequency of equations and expressions. We present baselines that demonstrate how contemporary language models are challenged by coherence related tasks in physics, even when trained on mathematical natural language objectives.
翻译:为使语言模型能够辅助物理学研究,其必须首先编码数学与自然语言语篇的表征,从而形成具有正确语句顺序与相关性的连贯解释。我们提出一组数据集,用于评估语言模型在此方面的表现,涵盖语句排序、位置预测、章节预测及语篇连贯性等能力。数据分析揭示了物理学语篇中最常见的方程与子学科,以及方程与表达式在句子层面的出现频率。我们提供了基线结果,表明即使经过数学自然语言目标训练,当代语言模型在处理物理领域的连贯性相关任务时仍面临挑战。