Confusing charge prediction is a challenging task in legal AI, which involves predicting confusing charges based on fact descriptions. While existing charge prediction methods have shown impressive performance, they face significant challenges when dealing with confusing charges, such as Snatch and Robbery. In the legal domain, constituent elements play a pivotal role in distinguishing confusing charges. Constituent elements are fundamental behaviors underlying criminal punishment and have subtle distinctions among charges. In this paper, we introduce a novel From Graph to Word Bag (FWGB) approach, which introduces domain knowledge regarding constituent elements to guide the model in making judgments on confusing charges, much like a judge's reasoning process. Specifically, we first construct a legal knowledge graph containing constituent elements to help select keywords for each charge, forming a word bag. Subsequently, to guide the model's attention towards the differentiating information for each charge within the context, we expand the attention mechanism and introduce a new loss function with attention supervision through words in the word bag. We construct the confusing charges dataset from real-world judicial documents. Experiments demonstrate the effectiveness of our method, especially in maintaining exceptional performance in imbalanced label distributions.
翻译:混淆罪名预测是法律人工智能中的一项挑战性任务,涉及基于事实描述预测易混淆的罪名(如抢劫罪与抢夺罪)。现有罪名预测方法虽展现出显著性能,但在处理此类混淆罪名时仍面临重大挑战。在法律领域,构成要件在区分混淆罪名中起着关键作用——构成要件是刑事处罚的基础行为要素,且在不同罪名间存在细微差异。本文提出了一种新颖的"从图到词袋"(FWGB)方法,通过引入构成要件的领域知识引导模型对混淆罪名进行判断,其推理过程类似于法官的思维模式。具体而言,我们首先构建包含构成要件的法律知识图谱,为每个罪名筛选关键词并形成词袋;其次,为引导模型聚焦于上下文中各罪名的区分性信息,我们扩展了注意力机制,并通过词袋中的词语引入带注意力监督的新型损失函数。基于真实司法文书构建的混淆罪名数据集实验表明,该方法在标签分布不均衡的情况下仍能保持优异性能,验证了其有效性。