ChatGPT is a large language model developed by OpenAI. Despite its impressive performance across various tasks, no prior work has investigated its capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of ChatGPT on various benchmark biomedical tasks, such as relation extraction, document classification, question answering, and summarization. To the best of our knowledge, this is the first work that conducts an extensive evaluation of ChatGPT in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot ChatGPT even outperforms the state-of-the-art fine-tuned generative transformer models, such as BioGPT and BioBART. This suggests that ChatGPT's pre-training on large text corpora makes it quite specialized even in the biomedical domain. Our findings demonstrate that ChatGPT has the potential to be a valuable tool for various tasks in the biomedical domain that lack large annotated data.
翻译:ChatGPT是由OpenAI开发的大型语言模型。尽管该模型在各种任务中表现出色,但此前尚无研究探讨其在生物医学领域的能力。为此,本文旨在评估ChatGPT在多个基准生物医学任务(如关系抽取、文档分类、问答和摘要生成)中的表现。据我们所知,这是首个对ChatGPT在生物医学领域进行广泛评估的研究。有趣的是,基于我们的评估发现,在训练集较小的生物医学数据集中,零样本的ChatGPT甚至超越了最先进的微调生成式Transformer模型(如BioGPT和BioBART)。这表明,ChatGPT在大规模文本语料上的预训练使其在生物医学领域也具有高度专长。我们的研究结果表明,ChatGPT有潜力成为生物医学领域中缺乏大规模标注数据的各类任务的宝贵工具。