CaseGNN: Graph Neural Networks for Legal Case Retrieval with Text-Attributed Graphs

Legal case retrieval is an information retrieval task in the legal domain, which aims to retrieve relevant cases with a given query case. Recent research of legal case retrieval mainly relies on traditional bag-of-words models and language models. Although these methods have achieved significant improvement in retrieval accuracy, there are still two challenges: (1) Legal structural information neglect. Previous neural legal case retrieval models mostly encode the unstructured raw text of case into a case representation, which causes the lack of important legal structural information in a case and leads to poor case representation; (2) Lengthy legal text limitation. When using the powerful BERT-based models, there is a limit of input text lengths, which inevitably requires to shorten the input via truncation or division with a loss of legal context information. In this paper, a graph neural networks-based legal case retrieval model, CaseGNN, is developed to tackle these challenges. To effectively utilise the legal structural information during encoding, a case is firstly converted into a Text-Attributed Case Graph (TACG), followed by a designed Edge Graph Attention Layer and a readout function to obtain the case graph representation. The CaseGNN model is optimised with a carefully designed contrastive loss with easy and hard negative sampling. Since the text attributes in the case graph come from individual sentences, the restriction of using language models is further avoided without losing the legal context. Extensive experiments have been conducted on two benchmarks from COLIEE 2022 and COLIEE 2023, which demonstrate that CaseGNN outperforms other state-of-the-art legal case retrieval methods. The code has been released on https://github.com/yanran-tang/CaseGNN.

翻译：法律案例检索是法律领域中的一项信息检索任务，旨在从给定查询案例中检索相关案例。近年来，法律案例检索的研究主要依赖传统的词袋模型和语言模型。尽管这些方法在检索准确性上取得了显著提升，但仍面临两大挑战：（1）法律结构信息缺失。以往基于神经网络的案例检索模型大多将案例的非结构化原始文本编码为案例表示，导致案例中重要的法律结构信息缺失，进而造成案例表示不佳；（2）长文本限制。在使用强大的基于BERT的模型时，存在输入文本长度的限制，这不可避免地需要通过截断或分割来缩短输入，从而损失了法律上下文信息。本文提出了一种基于图神经网络的法律案例检索模型CaseGNN，以应对上述挑战。为在编码过程中有效利用法律结构信息，首先将案例转换为文本属性案例图（TACG），随后通过设计的边图注意力层和读出函数获取案例图表示。CaseGNN模型采用精心设计的对比损失函数，并结合简单与困难负采样策略进行优化。由于案例图中的文本属性来自单个句子，因此进一步避免了使用语言模型时的输入限制，同时不损失法律上下文信息。在COLIEE 2022和COLIEE 2023两个基准数据集上的广泛实验表明，CaseGNN优于其他现有的法律案例检索方法。代码已发布至https://github.com/yanran-tang/CaseGNN。