This paper presents the system description of our entry for the COLING 2025 RegNLP RIRAG (Regulatory Information Retrieval and Answer Generation) challenge, focusing on leveraging advanced information retrieval and answer generation techniques in regulatory domains. We experimented with a combination of embedding models, including Stella, BGE, CDE, and Mpnet, and leveraged fine-tuning and reranking for retrieving relevant documents in top ranks. We utilized a novel approach, LeSeR, which achieved competitive results with a recall@10 of 0.8201 and map@10 of 0.6655 for retrievals. This work highlights the transformative potential of natural language processing techniques in regulatory applications, offering insights into their capabilities for implementing a retrieval augmented generation system while identifying areas for future improvement in robustness and domain adaptation.
翻译:本文介绍了我们为COLING 2025 RegNLP RIRAG(监管信息检索与答案生成)挑战赛提交的系统方案,重点探讨了在监管领域中应用先进信息检索与答案生成技术的方法。我们尝试了多种嵌入模型的组合,包括Stella、BGE、CDE和Mpnet,并通过微调与重排序技术提升相关文档的检索排名。我们采用了一种创新方法LeSeR,在检索任务中取得了具有竞争力的结果:召回率@10达到0.8201,平均精度@10达到0.6655。本研究凸显了自然语言处理技术在监管应用中的变革潜力,为构建检索增强生成系统提供了技术洞见,同时指出了未来在系统鲁棒性和领域适应性方面需要改进的方向。