Rhetorical Role Labeling (RRL) of legal documents is pivotal for various downstream tasks such as summarization, semantic case search and argument mining. Existing approaches often overlook the varying difficulty levels inherent in legal document discourse styles and rhetorical roles. In this work, we propose HiCuLR, a hierarchical curriculum learning framework for RRL. It nests two curricula: Rhetorical Role-level Curriculum (RC) on the outer layer and Document-level Curriculum (DC) on the inner layer. DC categorizes documents based on their difficulty, utilizing metrics like deviation from a standard discourse structure and exposes the model to them in an easy-to-difficult fashion. RC progressively strengthens the model to discern coarse-to-fine-grained distinctions between rhetorical roles. Our experiments on four RRL datasets demonstrate the efficacy of HiCuLR, highlighting the complementary nature of DC and RC.
翻译:法律文档的修辞角色标注(RRL)对于摘要生成、语义案例检索及论点挖掘等多种下游任务至关重要。现有方法往往忽视了法律文档语篇风格与修辞角色中固有的难度差异。本文提出HiCuLR——一种用于RRL的层次化课程学习框架。该框架嵌套了两层课程结构:外层的修辞角色级课程(RC)与内层的文档级课程(DC)。DC通过量化文档与标准语篇结构的偏离度等指标,依据难度对文档进行分类,并以从易到难的方式让模型进行学习。RC则逐步增强模型对修辞角色间从粗粒度到细粒度差异的辨识能力。我们在四个RRL数据集上的实验验证了HiCuLR的有效性,并揭示了DC与RC的互补特性。