MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling

We present MatSci-NLP, a natural language benchmark for evaluating the performance of natural language processing (NLP) models on materials science text. We construct the benchmark from publicly available materials science text data to encompass seven different NLP tasks, including conventional NLP tasks like named entity recognition and relation classification, as well as NLP tasks specific to materials science, such as synthesis action retrieval which relates to creating synthesis procedures for materials. We study various BERT-based models pretrained on different scientific text corpora on MatSci-NLP to understand the impact of pretraining strategies on understanding materials science text. Given the scarcity of high-quality annotated data in the materials science domain, we perform our fine-tuning experiments with limited training data to encourage the generalize across MatSci-NLP tasks. Our experiments in this low-resource training setting show that language models pretrained on scientific text outperform BERT trained on general text. MatBERT, a model pretrained specifically on materials science journals, generally performs best for most tasks. Moreover, we propose a unified text-to-schema for multitask learning on \benchmark and compare its performance with traditional fine-tuning methods. In our analysis of different training methods, we find that our proposed text-to-schema methods inspired by question-answering consistently outperform single and multitask NLP fine-tuning methods. The code and datasets are publicly available at \url{https://github.com/BangLab-UdeM-Mila/NLP4MatSci-ACL23}.

翻译：我们提出MatSci-NLP，一个用于评估自然语言处理（NLP）模型在材料科学文本上性能的自然语言基准。我们从公开可用的材料科学文本数据构建该基准，涵盖七种不同的NLP任务，包括传统NLP任务（如命名实体识别和关系分类）以及材料科学特有的NLP任务（如与材料合成流程相关的合成动作检索）。我们研究了基于不同科学文本语料预训练的各种BERT模型在MatSci-NLP上的表现，以理解预训练策略对理解材料科学文本的影响。鉴于材料科学领域高质量标注数据的稀缺性，我们在有限训练数据下进行微调实验，以促进在MatSci-NLP任务上的泛化能力。在此低资源训练设置下的实验表明，预训练于科学文本的语言模型优于基于通用文本训练的BERT。其中，专门预训练于材料科学期刊的MatBERT模型在大多数任务上表现最佳。此外，我们提出一个统一的文本到模式框架用于该基准的多任务学习，并比较其与传统微调方法的性能。在不同训练方法的分析中，我们发现受问答启发提出的文本到模式方法始终优于单任务和多任务NLP微调方法。代码与数据集公开于\url{https://github.com/BangLab-UdeM-Mila/NLP4MatSci-ACL23}。