In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high cost and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research. We developed a benchmark dataset within the field of computer science, consisting of 100 human-authored complex query cases. For each complex query, we assembled a collection of 100 relevant documents and produced annotated relevance scores for ranking them. Recognizing the significant labor of expert annotation, we also introduce Anno-GPT, a scalable framework for validating the performance of Large Language Models (LLMs) on expert-level dataset annotation tasks. LLM annotation of the DORIS-MAE dataset resulted in a 500x reduction in cost, without compromising quality. Furthermore, due to the multi-tiered structure of these complex queries, the DORIS-MAE dataset can be extended to over 4,000 sub-query test cases without requiring additional annotation. We evaluated 17 recent retrieval methods on DORIS-MAE, observing notable performance drops compared to traditional datasets. This highlights the need for better approaches to handle complex, multifaceted queries in scientific research. Our dataset and codebase are available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset.
翻译:在科学研究中,基于复杂多层面查询有效检索相关文献的能力至关重要。现有针对该任务的评估数据集十分有限,主要原因是标注能有效表征复杂查询的资源成本高昂且工作繁重。为解决这一问题,我们提出了一项新任务——基于多层级方面查询的科学文献检索(DORIS-MAE),旨在处理科学研究中用户查询的复杂特性。我们在计算机科学领域构建了一个基准数据集,包含100个人工撰写的复杂查询案例。针对每个复杂查询,我们收集了100篇相关文献,并生成了用于排序的标注相关性分数。考虑到专家标注的巨大工作量,我们还引入了Anno-GPT这一可扩展框架,用于验证大型语言模型(LLM)在专家级数据集标注任务上的性能。使用LLM对DORIS-MAE数据集进行标注,成本降低了500倍,且未牺牲质量。此外,由于这些复杂查询的多层级结构,DORIS-MAE数据集可扩展至超过4000个子查询测试案例,无需额外标注。我们在DORIS-MAE上评估了17种最新检索方法,发现其性能相较于传统数据集有显著下降。这凸显了需要更好的方法来处理科学研究中复杂、多层面的查询。我们的数据集和代码库已在https://github.com/Real-Doris-Mae/Doris-Mae-Dataset 公开。