We present Spacerini, a tool that integrates the Pyserini toolkit for reproducible information retrieval research with Hugging Face to enable the seamless construction and deployment of interactive search engines. Spacerini makes state-of-the-art sparse and dense retrieval models more accessible to non-IR practitioners while minimizing deployment effort. This is useful for NLP researchers who want to better understand and validate their research by performing qualitative analyses of training corpora, for IR researchers who want to demonstrate new retrieval models integrated into the growing Pyserini ecosystem, and for third parties reproducing the work of other researchers. Spacerini is open source and includes utilities for loading, preprocessing, indexing, and deploying search engines locally and remotely. We demonstrate a portfolio of 13 search engines created with Spacerini for different use cases.
翻译:我们提出Spacerini,这是一个将可复现信息检索研究工具包Pyserini与Hugging Face集成的工具,能够实现交互式搜索引擎的无缝构建与部署。Spacerini使最先进的稀疏和密集检索模型对非信息检索从业者更加易用,同时将部署工作量降至最低。该工具对以下用户具有实用价值:希望通过定性分析训练语料库以更好地理解和验证其研究的自然语言处理研究人员;希望展示集成到日益增长的Pyserini生态系统中新检索模型的信息检索研究人员;以及复现其他研究者工作的第三方用户。Spacerini是开源工具,包含用于本地及远程加载、预处理、索引和部署搜索引擎的实用程序。我们展示了使用Spacerini针对不同用例创建的13个搜索引擎组合。