This paper reports on the use of prompt engineering and GPT-3.5 for biomedical query-focused multi-document summarisation. Using GPT-3.5 and appropriate prompts, our system achieves top ROUGE-F1 results in the task of obtaining short-paragraph-sized answers to biomedical questions in the 2023 BioASQ Challenge (BioASQ 11b). This paper confirms what has been observed in other domains: 1) Prompts that incorporated few-shot samples generally improved on their counterpart zero-shot variants; 2) The largest improvement was achieved by retrieval augmented generation. The fact that these prompts allow our top runs to rank within the top two runs of BioASQ 11b demonstrate the power of using adequate prompts for Large Language Models in general, and GPT-3.5 in particular, for query-focused summarisation.
翻译:本论文报告了提示工程与GPT-3.5在生物医学查询聚焦多文档摘要中的应用。通过使用GPT-3.5及适当的提示,我们的系统在2023年BioASQ挑战赛(BioASQ 11b)中,针对生物医学问题生成短段落级答案的任务中取得了最高的ROUGE-F1分数。本研究证实了在其他领域已观察到的现象:1)包含少样本示例的提示通常优于其对应的零样本变体;2)检索增强生成带来了最大的性能提升。这些提示使我们的最佳结果在BioASQ 11b中位列前两名,这充分证明了在查询聚焦摘要任务中,对大型语言模型(尤其是GPT-3.5)使用适当提示的强大效果。