AstroLLaMA: Towards Specialized Foundation Models in Astronomy

Tuan Dung Nguyen,Yuan-Sen Ting,Ioana Ciucă,Charlie O'Neill,Ze-Chang Sun,Maja Jabłońska,Sandor Kruk,Ernest Perkowski,Jack Miller,Jason Li,Josh Peek,Kartheik Iyer,Tomasz Różański,Pranav Khetarpal,Sharaf Zaman,David Brodrick,Sergio J. Rodríguez Méndez,Thang Bui,Alyssa Goodman,Alberto Accomazzi,Jill Naiman,Jesse Cranney,Kevin Schawinski, UniverseTBD

from arxiv, 6 pages, 3 figures, submitted to IJCNLP-AACL 2023. Comments are welcome. The model can be found on Hugging Face - https://huggingface.co/universeTBD/astrollama

Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy. To bridge this gap, we introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv. Optimized for traditional causal language modeling, AstroLLaMA achieves a 30% lower perplexity than Llama-2, showing marked domain adaptation. Our model generates more insightful and scientifically relevant text completions and embedding extraction than state-of-the-arts foundation models despite having significantly fewer parameters. AstroLLaMA serves as a robust, domain-specific model with broad fine-tuning potential. Its public release aims to spur astronomy-focused research, including automatic paper summarization and conversational agent development.

翻译：大语言模型在许多人类语言任务中表现出色，但在高度专业化领域（如学术天文学）中往往表现不佳。为填补这一差距，我们提出了AstroLLaMA，这是一个基于LLaMA-2微调而成的70亿参数模型，使用了来自arXiv的超过30万篇天文学摘要。针对传统因果语言建模进行了优化，AstroLLaMA的困惑度比LLaMA-2降低了30%，展现出显著的领域适应性。尽管参数数量明显较少，但我们的模型在文本补全和嵌入提取方面比最先进的基础模型更能生成富有洞察力且具有科学相关性的内容。AstroLLaMA作为一个稳健的领域专用模型，具有广泛的微调潜力。将其公开发布旨在推动以天文学为重点的研究，包括自动论文摘要和对话代理的开发。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日