Legal language can be understood as the language typically used by those engaged in the legal profession and, as such, it may come both in spoken or written form. Recent legislation on cybersecurity obviously uses legal language in writing, thus inheriting all its interpretative complications due to the typical abundance of cases and sub-cases as well as to the general richness in detail. This paper faces the challenge of the essential interpretation of the legal language of cybersecurity, namely of the extraction of the essential Parts of Speech (POS) from the legal documents concerning cybersecurity. The challenge is overcome by our methodology for POS tagging of legal language. It leverages state-of-the-art open-source tools for Natural Language Processing (NLP) as well as manual analysis to validate the outcomes of the tools. As a result, the methodology is automated and, arguably, general for any legal language following minor tailoring of the preprocessing step. It is demonstrated over the most relevant EU legislation on cybersecurity, namely on the NIS 2 directive, producing the first, albeit essential, structured interpretation of such a relevant document. Moreover, our findings indicate that tools such as SpaCy and ClausIE reach their limits over the legal language of the NIS 2.
翻译:法律语言可理解为法律从业者通常使用的语言,其形式既包括口语也包括书面语。近期关于网络安全的立法显然以书面形式运用法律语言,因此继承了因典型案例和子案例的丰富性以及细节的普遍详尽性而导致的所有解释复杂性。本文面临网络安全法律语言基本解释的挑战,即从涉及网络安全的法律文档中提取核心词性。我们通过法律语言词性标注方法克服了这一挑战。该方法利用最先进的开源自然语言处理工具,并结合人工分析以验证工具的输出结果。最终,该方法实现了自动化,并且经过预处理步骤的微调后,原则上可推广至任何法律语言。我们在欧盟最重要的网络安全立法(即NIS 2指令)上进行了验证,生成了这一重要文档的首个(尽管是基础性的)结构化解释。此外,我们的研究结果表明,诸如SpaCy和ClausIE等工具在处理NIS 2的法律语言时已接近其能力极限。