In this paper, we summarize the current state of the field of NLP & Law with a specific focus on recent technical and substantive developments. To support our analysis, we construct and analyze a nearly complete corpus of more than six hundred NLP & Law related papers published over the past decade. Our analysis highlights several major trends. Namely, we document an increasing number of papers written, tasks undertaken, and languages covered over the course of the past decade. We observe an increase in the sophistication of the methods which researchers deployed in this applied context. Slowly but surely, Legal NLP is beginning to match not only the methodological sophistication of general NLP but also the professional standards of data availability and code reproducibility observed within the broader scientific community. We believe all of these trends bode well for the future of the field, but many questions in both the academic and commercial sphere still remain open.
翻译:本文系统梳理了自然语言处理与法律交叉领域的研究现状,重点聚焦近年来的技术演进与实质性进展。为支撑分析,我们构建并分析了近十年来发表的六百余篇NLP与法律相关论文的近乎完整的语料库。研究揭示了若干重要趋势:过去十年间,论文发表数量、研究任务类型及语言覆盖范围均呈现持续增长态势;研究者在该应用场景中部署的方法论日趋精密。法律领域自然语言处理正逐步实现双重突破——不仅方法论精密程度开始比肩通用自然语言处理领域,更在数据可用性与代码可复现性方面达到更广泛科学界的专业标准。我们认为这些趋势对领域未来发展具有积极意义,但学术与商业领域仍存在诸多未解难题。