Large language models (LLMs) have shown amazing capabilities in knowledge memorization and present. However, when it comes to domain-specific knowledge and downstream tasks like medical, general LLMs are often unable to give precise answers. In addition, when people want LLMs to answer classification questions, they usually go through instruction tuning first, however, LLMs do not always give a direct index of the categorization after instruction tuning. In this paper, we proposed LlamaCare, a fine-tuned medical language model, and Extended Classification Integration(ECI), a module to handle classification problems of LLMs. Our contributions are : (i) We fine-tuned a large language model of medical knowledge with very low carbon emissions and achieved similar performance with ChatGPT by a 24G GPU. (ii) We solved the problem of redundant categorical answers and improved the performance of LLMs by proposing a new module called Extended Classification Integration. (iii) We released our processed data for one-shot and few-shot training for some benchmarks such as PubMedQA and USMLE 1-3 step. Our method achieves a close effect with the state-of-the-art model in benchmarks while costing lower GPU resources compared to LLMs with the same quantity of parameters. Our models, codes, and datasets can be found in https://github.com/Stephen-SMJ/LLamaCare
翻译:大型语言模型(LLM)在知识记忆与呈现方面展现出卓越能力。然而,当涉及医学等专业领域知识与下游任务时,通用大型语言模型往往无法提供精确答案。此外,当用户期望大型语言模型回答分类问题时,通常需先进行指令微调,但指令微调后模型并不总能直接输出分类索引。本文提出LlamaCare——一个经过微调的医学语言模型,以及扩展分类集成模块(Extended Classification Integration, ECI)——用于处理大型语言模型的分类问题。我们的贡献包括:(i)以极低的碳排放量对医学知识大型语言模型进行微调,仅使用24G GPU即达到与ChatGPT相近的性能;(ii)通过提出扩展分类集成模块,解决了分类答案冗余问题并提升了大型语言模型的性能;(iii)针对PubMedQA、USMLE 1-3阶段等基准测试,发布了适用于单样本与少样本训练的已处理数据集。在基准测试中,本方法在消耗更低GPU资源的情况下,取得了与同等参数量级大型语言模型中最先进模型相近的效果。模型、代码及数据集可通过https://github.com/Stephen-SMJ/LLamaCare获取。