Large language models (LLMs) have shown substantial progress in natural language understanding and generation, proving valuable especially in the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks, which can be categorized as knowledge-intensive tasks and alignment-required tasks. Previous approaches either ignore the latter task or focus on a minority of tasks and hence lose generalization. To address these drawbacks, we propose a progressive fine-tuning pipeline. This pipeline employs a Knowledge Aggregator and a Noise aggregator to encode diverse knowledge in the first stage and filter out detrimental information. In the second stage, we drop the Noise Aggregator to avoid the interference of suboptimal representation and leverage an additional alignment module optimized towards an orthogonal direction to the knowledge space to mitigate knowledge forgetting. Based on this two-stage paradigm, we proposed a Medical LLM through decoupling Clinical Alignment and Knowledge Aggregation (MedCare), which is designed to achieve state-of-the-art (SOTA) performance on over 20 medical tasks, as well as SOTA results on specific medical alignment tasks. Various model sizes of MedCare (1.8B, 7B, 14B) all demonstrate significant improvements over existing models with similar model sizes.
翻译:大语言模型在自然语言理解与生成方面已展现出显著进展,尤其在医学领域体现出重要价值。尽管取得了进步,但由于医学任务固有的复杂性与多样性(可归类为知识密集型任务与需对齐任务),挑战依然存在。先前的研究方法要么忽略后者,要么仅聚焦于少数任务,因而丧失了泛化能力。为克服这些缺陷,我们提出了一种渐进式微调流程。该流程在第一阶段采用知识聚合器与噪声聚合器来编码多样化知识并滤除有害信息;在第二阶段,我们舍弃噪声聚合器以避免次优表征的干扰,并利用一个沿知识空间正交方向优化的额外对齐模块来缓解知识遗忘。基于此两阶段范式,我们提出了通过解耦临床对齐与知识聚合的医学大语言模型(MedCare),该模型旨在在超过20项医学任务上实现最先进性能,并在特定医学对齐任务上取得最优结果。不同规模版本的MedCare(1.8B、7B、14B)均在同类规模模型中展现出显著提升。