LEXA: Legal Case Retrieval via Graph Contrastive Learning with Contextualised LLM Embeddings

Legal case retrieval (LCR) is a specialised information retrieval task aimed at identifying relevant cases given a query case. LCR holds pivotal significance in facilitating legal practitioners to locate legal precedents. Existing LCR methods predominantly rely on traditional lexical models or language models; however, they typically overlook the domain-specific structural information embedded in legal documents. Our previous work CaseGNN successfully harnesses text-attributed graphs and graph neural networks to incorporate structural legal information. Nonetheless, three key challenges remain in enhancing the representational capacity of CaseGNN: (1) The under-utilisation of rich edge information in text-attributed case graph (TACG). (2) The insufficiency of training signals for graph contrastive learning. (3) The lack of contextualised legal information in node and edge features. In this paper, the LEXA model, an extension of CaseGNN, is proposed to overcome these limitations by jointly leveraging rich edge information, enhanced training signals, and contextualised embeddings derived from large language models (LLMs). Specifically, an edge-updated graph attention layer (EUGAT) is proposed to comprehensively update node and edge features during graph modelling, resulting in a full utilisation of structural information of legal cases. Moreover, LEXA incorporates a novel graph contrastive learning objective with graph augmentation to provide additional training signals, thereby strengthening the model's legal comprehension capabilities. What's more, LLMs are employed to generate node and edge features for TACG. Extensive experiments on two benchmark datasets demonstrate that LEXA not only significantly improves CaseGNN but also achieves supreme performance compared to state-of-the-art LCR methods.

翻译：法律案例检索是一项专门的信息检索任务，旨在根据查询案例识别相关案例。该任务对于协助法律从业者查找判例具有关键意义。现有的法律案例检索方法主要依赖传统的词汇模型或语言模型，但通常忽略了法律文档中蕴含的领域特定结构信息。我们先前的工作CaseGNN成功利用文本属性图和图神经网络来整合法律结构信息。然而，在提升CaseGNN表征能力方面仍存在三个关键挑战：(1) 文本属性案例图中丰富边信息的利用不足；(2) 图对比学习的训练信号不充分；(3) 节点与边特征中缺乏情境化的法律信息。本文提出的LEXA模型作为CaseGNN的扩展，通过联合利用丰富的边信息、增强的训练信号以及源自大语言模型的情境化嵌入来克服这些局限。具体而言，我们提出了一种边更新图注意力层，在图建模过程中全面更新节点与边特征，从而实现对法律案例结构信息的充分利用。此外，LEXA引入了一种结合图增强的新型图对比学习目标以提供额外训练信号，从而增强模型的法律理解能力。更进一步地，我们采用大语言模型为文本属性案例图生成节点与边特征。在两个基准数据集上的大量实验表明，LEXA不仅显著改进了CaseGNN，而且相较于最先进的法律案例检索方法实现了最优性能。