Large Language Models (LLMs) have demonstrated remarkable adaptability, showcasing their capacity to excel in tasks for which they were not explicitly trained. However, despite their impressive natural language processing (NLP) capabilities, effective alignment of LLMs remains a crucial challenge when deploying them for specific clinical applications. The ability to generate responses with factually accurate content and to engage in non-trivial reasoning steps are crucial for the LLMs to be eligible for applications in clinical medicine. Employing a combination of techniques including instruction-tuning and in-prompt strategies like few-shot and chain of thought prompting has significantly enhanced the performance of LLMs. Our proposed alignment strategy for medical question-answering, known as 'expand-guess-refine', offers a parameter and data-efficient solution. A preliminary analysis of this method demonstrated outstanding performance, achieving a score of 70.63% on a subset of questions sourced from the USMLE dataset.
翻译:大型语言模型(LLMs)展现了卓越的适应性,证明其在未经明确训练的任务中也能表现出色。然而,尽管具备强大的自然语言处理(NLP)能力,在将LLMs部署于特定临床应用程序时,有效的对齐仍是一项关键挑战。生成事实准确内容并进行非平凡推理的能力,对于LLMs在临床医学中的应用至关重要。结合指令微调以及提示内策略(如少样本学习和思维链提示)等技术,已显著提升了LLMs的性能。我们提出的针对医疗问答的对齐策略——"扩展-猜测-优化"(expand-guess-refine)——提供了一种参数和数据高效的解决方案。对该方法的初步分析显示其表现卓越,在源自USMLE数据集的一个子集问题上取得了70.63%的得分。