Given the dominance of dense retrievers that do not generalize well beyond their training dataset distributions, domain-specific test sets are essential in evaluating retrieval. There are few test datasets for retrieval systems intended for use by healthcare providers in a point-of-care setting. To fill this gap we have collaborated with medical professionals to create CURE, an ad-hoc retrieval test dataset for passage ranking with 2000 queries spanning 10 medical domains with a monolingual (English) and two cross-lingual (French/Spanish -> English) conditions. In this paper, we describe how CURE was constructed and provide baseline results to showcase its effectiveness as an evaluation tool. CURE is published with a Creative Commons Attribution Non Commercial 4.0 license and can be accessed on Hugging Face.
翻译:鉴于密集检索器在超出其训练数据分布范围时泛化能力不佳,领域特定的测试集对于评估检索性能至关重要。目前,面向医疗保健提供者在临床护理点环境中使用的检索系统测试数据集较为匮乏。为填补这一空白,我们与医学专业人员合作创建了CURE,这是一个用于段落排序的临时检索测试数据集,包含2000个查询,涵盖10个医学领域,并设置了单语(英语)和两种跨语言(法语/西班牙语 -> 英语)检索条件。本文描述了CURE的构建过程,并提供了基线结果以展示其作为评估工具的有效性。CURE采用知识共享署名非商业性4.0许可协议发布,可在Hugging Face平台获取。