This research study investigates the efficiency of different information retrieval (IR) systems in accessing relevant information from the scientific literature during the COVID-19 pandemic. The study applies the TREC framework to the COVID-19 Open Research Dataset (CORD-19) and evaluates BM25, Contriever, and Bag of Embeddings IR frameworks. The objective is to build a test collection for search engines that tackle the complex information landscape during a pandemic. The study uses the CORD-19 dataset to train and evaluate the IR models and compares the results to those manually labeled in the TREC-COVID IR Challenge. The results indicate that advanced IR models like BERT and Contriever better retrieve relevant information during a pandemic. However, the study also highlights the challenges in processing large datasets and the need for strategies to focus on abstracts or summaries. Overall, the research highlights the importance of effectively tailored IR systems in dealing with information overload during crises like COVID-19 and can guide future research and development in this field.
翻译:本研究调查了COVID-19疫情期间不同信息检索系统从科学文献中获取相关信息的效率。研究将TREC框架应用于COVID-19开放研究数据集(CORD-19),并评估了BM25、Contriever和词嵌入袋三种IR框架。目的是为应对疫情期间复杂信息环境的搜索引擎构建测试集。研究使用CORD-19数据集训练和评估IR模型,并将结果与TREC-COVID IR挑战赛中人工标注的结果进行比较。结果表明,像BERT和Contriever这样的先进IR模型在疫情期间能更好地检索相关信息。然而,研究也指出了处理大规模数据集的挑战,以及需要制定聚焦于摘要或总结的策略。总体而言,本研究强调了有效定制的IR系统在应对COVID-19等危机期间信息过载问题中的重要性,并可指导该领域的未来研究和发展。