A wide range of transformer-based language models have been proposed for information retrieval tasks. However, including transformer-based models in retrieval pipelines is often complex and requires substantial engineering effort. In this paper, we introduce Lightning IR, an easy-to-use PyTorch Lightning-based framework for applying transformer-based language models in retrieval scenarios. Lightning IR provides a modular and extensible architecture that supports all stages of a retrieval pipeline: from fine-tuning and indexing to searching and re-ranking. Designed to be scalable and reproducible, Lightning IR is available as open-source: https://github.com/webis-de/lightning-ir.
翻译:近年来,针对信息检索任务已提出了多种基于Transformer的语言模型。然而,将基于Transformer的模型集成到检索流程中通常较为复杂,且需要大量的工程实现工作。本文提出Lightning IR,这是一个基于PyTorch Lightning的易用框架,旨在将基于Transformer的语言模型应用于检索场景。Lightning IR提供了模块化且可扩展的架构,支持检索流程的所有阶段:从微调与索引到搜索与重排序。该框架设计具备可扩展性与可复现性,并以开源形式发布:https://github.com/webis-de/lightning-ir。