Large Language models (LLMs) have demonstrated significant potential in transforming healthcare by automating tasks such as clinical documentation, information retrieval, and decision support. In this aspect, carefully engineered prompts have emerged as a powerful tool for using LLMs for medical scenarios, e.g., patient clinical scenarios. In this paper, we propose a modified version of the MedQA-USMLE dataset, which is subjective, to mimic real-life clinical scenarios. We explore the Chain of Thought (CoT) reasoning based on subjective response generation for the modified MedQA-USMLE dataset with appropriate LM-driven forward reasoning for correct responses to the medical questions. Keeping in mind the importance of response verification in the medical setting, we utilize a reward training mechanism whereby the language model also provides an appropriate verified response for a particular response to a clinical question. In this regard, we also include human-in-the-loop for different evaluation aspects. We develop better in-contrast learning strategies by modifying the 5-shot-codex-CoT-prompt from arXiv:2207.08143 for the subjective MedQA dataset and developing our incremental-reasoning prompt. Our evaluations show that the incremental reasoning prompt performs better than the modified codex prompt in certain scenarios. We also show that greedy decoding with the incremental reasoning method performs better than other strategies, such as prompt chaining and eliminative reasoning.
翻译:大型语言模型(LLMs)在通过自动化临床文档、信息检索和决策支持等任务变革医疗领域方面展现出巨大潜力。在此背景下,精心设计的提示已成为将LLMs应用于医疗场景(如患者临床场景)的有力工具。本文提出了一种主观性的MedQA-USMLE数据集改进版本,以模拟真实临床场景。针对改进后的MedQA-USMLE数据集,我们探索基于思维链(CoT)推理的主观响应生成方法,并采用适当的LM驱动前向推理来生成医疗问题的正确回答。考虑到医疗场景中响应验证的重要性,我们利用奖励训练机制,使语言模型能够针对临床问题的特定响应提供经过验证的恰当答案。为此,我们还在不同评估环节中引入了人工反馈。通过改进arXiv:2207.08143中的5-shot-codex-CoT-prompt,我们针对主观MedQA数据集开发了更好的上下文学习策略,并构建了增量推理提示。评估结果表明,增量推理提示在某些场景下优于改进后的codex提示。我们还发现,采用贪婪解码的增量推理方法在性能上优于提示链式推理和消减推理等其他策略。