LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction

Large language models (LLMs) are increasingly being used in materials science. However, little attention has been given to benchmarking and standardized evaluation for LLM-based materials property prediction, which hinders progress. We present LLM4Mat-Bench, the largest benchmark to date for evaluating the performance of LLMs in predicting the properties of crystalline materials. LLM4Mat-Bench contains about 1.9M crystal structures in total, collected from 10 publicly available materials data sources, and 45 distinct properties. LLM4Mat-Bench features different input modalities: crystal composition, CIF, and crystal text description, with 4.7M, 615.5M, and 3.1B tokens in total for each modality, respectively. We use LLM4Mat-Bench to fine-tune models with different sizes, including LLM-Prop and MatBERT, and provide zero-shot and few-shot prompts to evaluate the property prediction capabilities of LLM-chat-like models, including Llama, Gemma, and Mistral. The results highlight the challenges of general-purpose LLMs in materials science and the need for task-specific predictive models and task-specific instruction-tuned LLMs in materials property prediction.

翻译：大语言模型（LLMs）在材料科学中的应用日益增多。然而，目前对于基于LLM的材料性能预测的基准测试和标准化评估关注甚少，这阻碍了该领域的进展。我们提出了LLM4Mat-Bench，这是迄今为止用于评估LLMs在预测晶体材料性能方面表现的最大规模基准。LLM4Mat-Bench总共包含约190万个晶体结构，收集自10个公开可用的材料数据源，涵盖45种不同的性能。LLM4Mat-Bench具有不同的输入模态：晶体成分、CIF文件和晶体文本描述，每种模态的总标记数分别为470万、6.155亿和31亿。我们利用LLM4Mat-Bench对不同规模的模型（包括LLM-Prop和MatBERT）进行微调，并提供零样本和少样本提示，以评估类LLM聊天模型（包括Llama、Gemma和Mistral）的性能预测能力。结果突显了通用LLMs在材料科学中面临的挑战，以及在材料性能预测领域对任务特定的预测模型和任务特定的指令微调LLMs的需求。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日