A wide range of transformer-based language models have been proposed for information retrieval tasks. However, including transformer-based models in retrieval pipelines is often complex and requires substantial engineering effort. In this paper, we introduce Lightning IR, an easy-to-use PyTorch Lightning-based framework for applying transformer-based language models in retrieval scenarios. Lightning IR provides a modular and extensible architecture that supports all stages of a retrieval pipeline: from fine-tuning and indexing to searching and re-ranking. Designed to be scalable and reproducible, Lightning IR is available as open-source: https://github.com/webis-de/lightning-ir.
翻译:针对信息检索任务,学界已提出多种基于Transformer的语言模型。然而,将基于Transformer的模型集成到检索流程中通常较为复杂,且需要大量的工程投入。本文介绍Lightning IR——一个基于PyTorch Lightning的易用框架,用于在检索场景中应用基于Transformer的语言模型。Lightning IR提供模块化且可扩展的架构,支持检索流程的所有阶段:从微调与索引到搜索与重排序。该框架设计具备可扩展性与可复现性,并以开源形式发布:https://github.com/webis-de/lightning-ir。