General large language models (LLMs) such as ChatGPT have shown remarkable success. However, such LLMs have not been widely adopted for medical purposes, due to poor accuracy and inability to provide medical advice. We propose IvyGPT, an LLM based on LLaMA that is trained and fine-tuned with high-quality medical question-answer (QA) instances and Reinforcement Learning from Human Feedback (RLHF). After supervised fine-tuning, IvyGPT has good multi-turn conversation capabilities, but it cannot perform like a doctor in other aspects, such as comprehensive diagnosis. Through RLHF, IvyGPT can output richer diagnosis and treatment answers that are closer to human. In the training, we used QLoRA to train 33 billion parameters on a small number of NVIDIA A100 (80GB) GPUs. Experimental results show that IvyGPT has outperformed other medical GPT models.
翻译:通用大型语言模型(如ChatGPT)已展现出显著成功。然而,由于准确性不足且无法提供医疗建议,此类语言模型尚未在医疗领域得到广泛应用。我们提出IvyGPT,这是一种基于LLaMA的语言模型,通过高质量医疗问答实例和基于人类反馈的强化学习(RLHF)进行训练与微调。经过监督微调后,IvyGPT具备良好的多轮对话能力,但在其他方面(如综合诊断)尚无法像医生一样表现。通过RLHF,IvyGPT能够输出更接近人类水平的丰富诊断与治疗答案。在训练过程中,我们使用QLoRA在少量NVIDIA A100(80GB)GPU上训练了330亿参数。实验结果表明,IvyGPT已超越其他医疗GPT模型。