The primary aim of this research was to address the limitations observed in the medical knowledge of prevalent large language models (LLMs) such as ChatGPT, by creating a specialized language model with enhanced accuracy in medical advice. We achieved this by adapting and refining the large language model meta-AI (LLaMA) using a large dataset of 100,000 patient-doctor dialogues sourced from a widely used online medical consultation platform. These conversations were cleaned and anonymized to respect privacy concerns. In addition to the model refinement, we incorporated a self-directed information retrieval mechanism, allowing the model to access and utilize real-time information from online sources like Wikipedia and data from curated offline medical databases. The fine-tuning of the model with real-world patient-doctor interactions significantly improved the model's ability to understand patient needs and provide informed advice. By equipping the model with self-directed information retrieval from reliable online and offline sources, we observed substantial improvements in the accuracy of its responses. Our proposed ChatDoctor, represents a significant advancement in medical LLMs, demonstrating a significant improvement in understanding patient inquiries and providing accurate advice. Given the high stakes and low error tolerance in the medical field, such enhancements in providing accurate and reliable information are not only beneficial but essential.
翻译:本研究的主要目标是解决当前主流大型语言模型(如ChatGPT)在医学知识方面存在的局限性,通过构建一个在医疗建议上具有更高准确性的专业语言模型。我们通过改编并精炼大规模语言模型Meta-AI (LLaMA),利用从广泛使用的在线医疗咨询平台获取的10万条医患对话数据集,实现了这一目标。这些对话经过清洗和匿名处理以保护隐私。除模型精炼外,我们还整合了自导向信息检索机制,使模型能够访问并利用来自维基百科等在线来源的实时信息,以及经过整理的离线医学数据库。使用真实医患交互数据对模型进行微调,显著提升了模型理解患者需求并提供专业建议的能力。通过为模型配备来自可靠线上及线下资源的自导向信息检索功能,我们观察到其回答准确性得到实质性提升。我们提出的ChatDoctor代表了医疗领域大语言模型的重要进展,在理解患者咨询和提供准确建议方面展现出显著改善。鉴于医疗领域的高风险性和低容错性,此类提升准确可靠信息的提供能力不仅是裨益之举,更是必要之举。