The task of answer retrieval in the legal domain aims to help users to seek relevant legal advice from massive amounts of professional responses. Two main challenges hinder applying existing answer retrieval approaches in other domains to the legal domain: (1) a huge knowledge gap between lawyers and non-professionals; and (2) a mix of informal and formal content on legal QA websites. To tackle these challenges, we propose CE_FS, a novel cross-encoder (CE) re-ranker based on the fine-grained structured inputs. CE_FS uses additional structured information in the CQA data to improve the effectiveness of cross-encoder re-rankers. Furthermore, we propose LegalQA: a real-world benchmark dataset for evaluating answer retrieval in the legal domain. Experiments conducted on LegalQA show that our proposed method significantly outperforms strong cross-encoder re-rankers fine-tuned on MS MARCO. Our novel finding is that adding the question tags of each question besides the question description and title into the input of cross-encoder re-rankers structurally boosts the rankers' effectiveness. While we study our proposed method in the legal domain, we believe that our method can be applied in similar applications in other domains.
翻译:法律领域的答案检索任务旨在帮助用户从海量专业回复中寻找相关法律建议。现有其他领域的答案检索方法在应用于法律领域时,面临两大挑战:(1)律师与非专业人士之间存在巨大的知识鸿沟;(2)法律问答网站中存在非正式与正式内容的混杂。为解决这些问题,我们提出CE_FS——一种基于细粒度结构化输入的交叉编码器(CE)重排序器。CE_FS利用社区问答数据中的额外结构化信息提升交叉编码器重排序器的效果。此外,我们提出LegalQA:用于法律领域答案检索评估的真实世界基准数据集。在LegalQA上的实验表明,我们提出的方法显著优于在MS MARCO上微调的强交叉编码器重排序器。我们的新颖发现是:在交叉编码器重排序器的输入中,除问题描述和标题外,系统性地加入每个问题的问题标签,能够提升重排序器的有效性。尽管我们在法律领域研究该方法,但相信其可适用于其他领域的类似应用。