This work addresses the challenge of capturing the complexities of legal knowledge by proposing a multi-layered embedding-based retrieval method for legal and legislative texts. Creating embeddings not only for individual articles but also for their components (paragraphs, clauses) and structural groupings (books, titles, chapters, etc), we seek to capture the subtleties of legal information through the use of dense vectors of embeddings, representing it at varying levels of granularity. Our method meets various information needs by allowing the Retrieval Augmented Generation system to provide accurate responses, whether for specific segments or entire sections, tailored to the user's query. We explore the concepts of aboutness, semantic chunking, and inherent hierarchy within legal texts, arguing that this method enhances the legal information retrieval. Despite the focus being on Brazil's legislative methods and the Brazilian Constitution, which follow a civil law tradition, our findings should in principle be applicable across different legal systems, including those adhering to common law traditions. Furthermore, the principles of the proposed method extend beyond the legal domain, offering valuable insights for organizing and retrieving information in any field characterized by information encoded in hierarchical text.
翻译:本研究针对法律知识复杂性的捕获问题,提出了一种面向法律与立法文本的多层嵌入检索方法。通过不仅为独立条款、同时为其构成要素(段落、条款项)及结构分组(编、章、节等)创建嵌入表示,我们利用稠密嵌入向量在不同粒度层次上表征法律信息,从而捕捉其细微差异。该方法通过使检索增强生成系统能够根据用户查询需求,针对特定片段或完整章节提供精准响应,从而满足多样化的信息需求。我们探讨了法律文本中的相关性概念、语义分块及固有层次结构,论证了本方法对法律信息检索的增强作用。尽管研究聚焦于遵循大陆法系传统的巴西立法体系及巴西宪法,但我们的发现在原则上应适用于包括普通法系在内的不同法律体系。此外,所提方法的原理可延伸至法律领域之外,为任何具有层次化文本信息特征的领域提供信息组织与检索的重要参考。