SciDFM: A Large Language Model with Mixture-of-Experts for Science

Recently, there has been a significant upsurge of interest in leveraging large language models (LLMs) to assist scientific discovery. However, most LLMs only focus on general science, while they lack domain-specific knowledge, such as chemical molecules and amino acid sequences. To bridge these gaps, we introduce SciDFM, a mixture-of-experts LLM, which is trained from scratch and is able to conduct college-level scientific reasoning and understand molecules and amino acid sequences. We collect a large-scale training corpus containing numerous scientific papers and books from different disciplines as well as data from domain-specific databases. We further fine-tune the pre-trained model on lots of instruction data to improve performances on downstream benchmarks. From experiment results, we show that SciDFM achieves strong performance on general scientific benchmarks such as SciEval and SciQ, and it reaches a SOTA performance on domain-specific benchmarks among models of similar size. We further analyze the expert layers and show that the results of expert selection vary with data from different disciplines. To benefit the broader research community, we open-source SciDFM at https://huggingface.co/OpenDFM/SciDFM-MoE-A5.6B-v1.0.

翻译：近年来，利用大语言模型辅助科学发现的研究兴趣显著高涨。然而，大多数大语言模型仅关注通用科学领域，缺乏对化学分子、氨基酸序列等特定领域知识的深入理解。为弥补这些不足，我们提出了SciDFM，一种基于专家混合架构、从头开始训练的大语言模型。该模型能够进行大学水平的科学推理，并理解分子与氨基酸序列。我们收集了包含多学科海量科学论文、书籍以及来自特定领域数据库数据的大规模训练语料。在此基础上，我们使用大量指令数据对预训练模型进行微调，以提升其在下游基准测试上的性能。实验结果表明，SciDFM在SciEval、SciQ等通用科学基准测试中表现优异，并在同等规模模型中于特定领域基准测试上达到了最先进的性能水平。我们进一步分析了专家层，发现专家选择的结果随不同学科的数据而变化。为惠及更广泛的研究社区，我们在 https://huggingface.co/OpenDFM/SciDFM-MoE-A5.6B-v1.0 开源了SciDFM模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日