Large Language Models (LLMs) have demonstrated remarkable adaptability, showcasing their capacity to excel in tasks for which they were not explicitly trained. However, despite their impressive natural language processing (NLP) capabilities, effective alignment of LLMs remains a crucial challenge when deploying them for specific clinical applications. The ability to generate responses with factually accurate content and to engage in non-trivial reasoning steps are crucial for the LLMs to be eligible for applications in clinical medicine. Employing a combination of techniques including instruction-tuning and in-prompt strategies like few-shot and chain-of-thought prompting has significantly enhanced the performance of LLMs. Our proposed alignment strategy for medical question-answering, known as 'expand-guess-refine', offers a parameter and data-efficient solution. A preliminary analysis of this method demonstrated outstanding performance, achieving a score of 70.63% on a subset of questions sourced from the USMLE dataset.
翻译:大语言模型(LLMs)已展现出卓越的适应性,在未经明确训练的任务上表现优异。然而,尽管其自然语言处理(NLP)能力令人印象深刻,在为特定临床应用部署LLMs时,有效的对齐仍是一个关键挑战。生成事实准确内容的能力以及进行复杂推理步骤的能力,对于LLMs在临床医学中的应用至关重要。结合指令微调与提示内策略(如少样本提示和思维链提示)等技术,已显著提升了LLMs的性能。我们提出的针对医疗问答的对齐策略——即"扩展-猜测-精炼"方法——提供了一种参数和数据高效的解决方案。该方法的初步分析显示出卓越性能,在USMLE数据集子集问题上取得了70.63%的得分。