Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks

Ling Luo,Jinzhong Ning,Yingwen Zhao,Zhijun Wang,Zeyuan Ding,Peng Chen,Weiru Fu,Qinyu Han,Guangtao Xu,Yunzhi Qiu,Dinghao Pan,Jiru Li,Hao Li,Wenduo Feng,Senbo Tu,Yuqi Liu,Zhihao Yang,Jian Wang,Yuanyuan Sun,Hongfei Lin

Objective: Most existing fine-tuned biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical NLP tasks in different languages, We present Taiyi, a bilingual fine-tuned LLM for diverse biomedical tasks. Materials and Methods: We first curated a comprehensive collection of 140 existing biomedical text mining datasets (102 English and 38 Chinese datasets) across over 10 task types. Subsequently, a two-stage strategy is proposed for supervised fine-tuning to optimize the model performance across varied tasks. Results: Experimental results on 13 test sets covering named entity recognition, relation extraction, text classification, question answering tasks demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multi-tasking. Conclusion: Leveraging rich high-quality biomedical corpora and developing effective fine-tuning strategies can significantly improve the performance of LLMs within the biomedical domain. Taiyi shows the bilingual multi-tasking capability through supervised fine-tuning. However, those tasks such as information extraction that are not generation tasks in nature remain challenging for LLM-based generative approaches, and they still underperform the conventional discriminative approaches of smaller language models.

翻译：目的：现有的大多数精调生物医学大语言模型主要侧重于提升单语生物医学问答和对话任务的性能。为探究精调大语言模型在不同语言、多样化生物医学自然语言处理任务中的有效性，我们提出Taiyi——一个面向多样化生物医学任务的双语精调大语言模型。材料与方法：我们首先整理了140个现有生物医学文本挖掘数据集（102个英文数据集和38个中文数据集），涵盖超过10种任务类型。随后，提出一种两阶段监督精调策略以优化模型在各类任务上的表现。结果：在涵盖命名实体识别、关系抽取、文本分类和问答任务的13个测试集上的实验结果表明，Taiyi相比通用大语言模型取得了更优性能。涉及额外生物医学自然语言处理任务的案例研究进一步表明Taiyi在双语生物医学多任务处理方面具有显著潜力。结论：利用丰富的高质量生物医学语料并开发有效的精调策略，能显著提升大语言模型在生物医学领域的表现。Taiyi通过监督精调展现了双语多任务处理能力。然而，对于本质上不属于生成任务的信息抽取等任务，基于大语言模型的生成方法仍面临挑战，且其性能仍低于较小语言模型的传统判别方法。