We present a hybrid retrieval system for COVID-19 scientific literature, evaluated on the TREC-COVID benchmark (171,332 papers, 50 expert queries). The system implements six retrieval configurations spanning sparse (SPLADE), dense (BGE), rank-level fusion (RRF), and a projection-based vector fusion (B5) approach. RRF fusion achieves the best relevance (nDCG@10 = 0.828), outperforming dense-only by 6.1% and sparse-only by 14.9%. Our projection fusion variant reaches nDCG@10 = 0.678 on expert queries while being 33% faster (847 ms vs. 1271 ms) and producing 2.2x higher ILD@10 than RRF. Evaluation across 400 queries -- including expert, machine-generated, and three paraphrase styles -- shows that B5 delivers the largest relative gain on keyword-heavy reformulations (+8.8%), although RRF remains best in absolute nDCG@10. On expert queries, MMR reranking increases intra-list diversity by 23.8-24.5% at a 20.4-25.4% nDCG@10 cost. Both fusion pipelines evaluated for latency remain below the sub-2 s target across all query sets. The system is deployed as a Streamlit web application backed by Pinecone serverless indices.
翻译:我们提出了一个面向COVID-19科学文献的混合检索系统,并在TREC-COVID基准(171,332篇论文、50个专家查询)上进行了评估。该系统实现了六种检索配置,涵盖稀疏检索(SPLADE)、稠密检索(BGE)、排序级融合(RRF)以及一种基于投影的向量融合方法(B5)。RRF融合取得了最佳相关性(nDCG@10=0.828),分别比纯稠密检索和纯稀疏检索高6.1%和14.9%。我们的投影融合变体在专家查询上达到nDCG@10=0.678,同时速度比RRF快33%(847毫秒 vs. 1271毫秒),且ILD@10高出2.2倍。在包含专家查询、机器生成查询及三种改写风格的400个查询上的评估表明,尽管RRF在绝对nDCG@10上仍为最优,但B5在关键词密集的改写查询上取得了最大的相对增益(+8.8%)。在专家查询上,MMR重排序以20.4%-25.4%的nDCG@10代价提升了23.8%-24.5%的列表内多样性。两种融合管道的延迟评估均在所有查询集上保持在2秒以下。该系统已部署为由Pinecone无服务器索引支持的Streamlit网络应用。