Prompt engineering is crucial for harnessing the potential of large language models (LLMs), especially in the medical domain where specialized terminology and phrasing is used. However, the efficacy of prompt engineering in the medical domain remains to be explored. In this work, 114 recent studies (2022-2024) applying prompt engineering in medicine, covering prompt learning (PL), prompt tuning (PT), and prompt design (PD) are reviewed. PD is the most prevalent (78 articles). In 12 papers, PD, PL, and PT terms were used interchangeably. ChatGPT is the most commonly used LLM, with seven papers using it for processing sensitive clinical data. Chain-of-Thought emerges as the most common prompt engineering technique. While PL and PT articles typically provide a baseline for evaluating prompt-based approaches, 64% of PD studies lack non-prompt-related baselines. We provide tables and figures summarizing existing work, and reporting recommendations to guide future research contributions.
翻译:提示工程对于释放大型语言模型的潜力至关重要,尤其在医学领域,该领域涉及专业术语与表达方式。然而,提示工程在医学领域的实际效果仍有待探索。本研究系统综述了2022-2024年间114篇将提示工程应用于医学领域的最新文献,涵盖提示学习(PL)、提示微调(PT)和提示设计(PD)三类方法。其中,PD是最常用的方法(78篇文章),有12篇论文交替使用PD、PL和PT术语。ChatGPT是最常被使用的大型语言模型,有7篇论文将其应用于敏感临床数据处理。思维链技术是最常见的提示工程技术。尽管PL和PT类文章通常会提供评估提示方法效果的基线,但64%的PD研究缺乏非提示相关的基线对照。本文提供了总结现有工作成果的表格与图表,并给出规范化报告建议,以指导未来研究贡献。