Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks. However, these models have hundreds of billions of parameters, are computationally expensive to run, require users to send their input data over the internet, and are trained on unknown data sources. Can smaller, more targeted models compete? To address this question, we build and release BioMedLM, a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles. When fine-tuned, BioMedLM can produce strong multiple-choice biomedical question-answering results competitive with much larger models, such as achieving a score of 57.3% on MedMCQA (dev) and 69.0% on the MMLU Medical Genetics exam. BioMedLM can also be fine-tuned to produce useful answers to patient questions on medical topics. This demonstrates that smaller models can potentially serve as transparent, privacy-preserving, economical and environmentally friendly foundations for particular NLP applications, such as in biomedicine. The model is available on the Hugging Face Hub: https://huggingface.co/stanford-crfm/BioMedLM.
翻译:诸如GPT-4和Med-PaLM 2等模型在广泛的生物医学自然语言处理任务上展现了令人瞩目的性能。然而,这些模型拥有数千亿参数,计算成本高昂,需要用户通过互联网发送输入数据,且基于未知数据源进行训练。更小、更具针对性的模型能否与之竞争?为解决这一问题,我们构建并发布了BioMedLM——一个完全基于PubMed摘要和全文文章训练的27亿参数GPT风格自回归模型。经过微调,BioMedLM可在多项选择题生物医学问答任务中取得与更大模型相媲美的强劲结果,例如在MedMCQA(开发集)上达到57.3%的准确率,在MMLU医学遗传学考试中达到69.0%。BioMedLM还可针对医疗主题的患者问题微调生成有效答案。这表明,较小的模型有潜力成为特定自然语言处理应用(如生物医学领域)中透明、隐私保护、经济且环境友好的基础模型。该模型已在Hugging Face Hub上开放获取:https://huggingface.co/stanford-crfm/BioMedLM