Medical open-domain question answering demands substantial access to specialized knowledge. Recent efforts have sought to decouple knowledge from model parameters, counteracting architectural scaling and allowing for training on common low-resource hardware. The retrieve-then-read paradigm has become ubiquitous, with model predictions grounded on relevant knowledge pieces from external repositories such as PubMed, textbooks, and UMLS. An alternative path, still under-explored but made possible by the advent of domain-specific large language models, entails constructing artificial contexts through prompting. As a result, "to generate or to retrieve" is the modern equivalent of Hamlet's dilemma. This paper presents MedGENIE, the first generate-then-read framework for multiple-choice question answering in medicine. We conduct extensive experiments on MedQA-USMLE, MedMCQA, and MMLU, incorporating a practical perspective by assuming a maximum of 24GB VRAM. MedGENIE sets a new state-of-the-art in the open-book setting of each testbed, allowing a small-scale reader to outcompete zero-shot closed-book 175B baselines while using up to 706$\times$ fewer parameters. Our findings reveal that generated passages are more effective than retrieved ones in attaining higher accuracy.
翻译:医学开放域问答需要大量获取专业知识。近期研究试图将知识与模型参数解耦,以抵消架构扩展的影响,并实现在常见低资源硬件上的训练。检索-阅读范式已变得无处不在,其模型预测基于来自外部知识库(如PubMed、教科书和UMLS)的相关知识片段。另一条路径——通过提示构建人工上下文——虽尚未被充分探索,但随着领域特定大语言模型的出现已成为可能。因此,“生成还是检索”已成为现代版的哈姆雷特困境。本文提出MedGENIE,首个面向医学多项选择题问答的生成-阅读框架。我们在MedQA-USMLE、MedMCQA和MMLU数据集上进行了广泛实验,并基于最大24GB显存的现实条件展开评估。MedGENIE在每个测试集的开放书设置中均创造了新的最优性能,使小规模阅读器能够超越零样本闭书175B基线模型,同时使用的参数最多减少706倍。我们的研究结果表明,在获得更高准确率方面,生成文本比检索文本更为有效。