We apply NER to a particular sub-genre of legal texts in German: the genre of legal norms regulating administrative processes in public service administration. The analysis of such texts involves identifying stretches of text that instantiate one of ten classes identified by public service administration professionals. We investigate and compare three methods for performing Named Entity Recognition (NER) to detect these classes: a Rule-based system, deep discriminative models, and a deep generative model. Our results show that Deep Discriminative models outperform both the Rule-based system as well as the Deep Generative model, the latter two roughly performing equally well, outperforming each other in different classes. The main cause for this somewhat surprising result is arguably the fact that the classes used in the analysis are semantically and syntactically heterogeneous, in contrast to the classes used in more standard NER tasks. Deep Discriminative models appear to be better equipped for dealing with this heterogenerity than both generic LLMs and human linguists designing rule-based NER systems.
翻译:我们将命名实体识别(NER)应用于德语法律文本的一个特定子类型:规范公共服务管理中行政流程的法律条文。分析此类文本涉及识别文本中体现公共服务管理专业人员所定义的十种类别之一的片段。我们研究并比较了三种用于执行命名实体识别以检测这些类别的方法:基于规则的系统、深度判别式模型以及深度生成式模型。我们的结果表明,深度判别式模型在性能上优于基于规则的系统以及深度生成式模型,后两者表现大致相当,在不同类别上互有优劣。造成这一略显意外结果的主要原因在于,分析所使用的类别在语义和句法上具有异质性,这与更标准的NER任务中使用的类别形成对比。深度判别式模型在处理这种异质性方面,似乎比通用的LLM以及设计基于规则的NER系统的语言学家都更具优势。