Classification of US Supreme Court Cases using BERT-Based Techniques

Models based on bidirectional encoder representations from transformers (BERT) produce state of the art (SOTA) results on many natural language processing (NLP) tasks such as named entity recognition (NER), part-of-speech (POS) tagging etc. An interesting phenomenon occurs when classifying long documents such as those from the US supreme court where BERT-based models can be considered difficult to use on a first-pass or out-of-the-box basis. In this paper, we experiment with several BERT-based classification techniques for US supreme court decisions or supreme court database (SCDB) and compare them with the previous SOTA results. We then compare our results specifically with SOTA models for long documents. We compare our results for two classification tasks: (1) a broad classification task with 15 categories and (2) a fine-grained classification task with 279 categories. Our best result produces an accuracy of 80\% on the 15 broad categories and 60\% on the fine-grained 279 categories which marks an improvement of 8\% and 28\% respectively from previously reported SOTA results.

翻译：基于双向编码器表示（BERT）的模型在诸多自然语言处理（NLP）任务（如命名实体识别（NER）、词性标注（POS）等）中取得了最优（SOTA）结果。然而，在对美国最高法院案例等长文档进行分类时，出现了一个有趣的现象：基于BERT的模型在首次或开箱即用的情况下往往难以直接应用。本文实验了多种基于BERT的分类技术，用于对最高法院判决或最高法院数据库（SCDB）进行分类，并将其与先前的最优结果进行了比较。随后，我们专门针对长文档的最优模型进行了结果对比。我们针对两项分类任务进行比较：（1）包含15个类别的粗粒度分类任务；（2）包含279个类别的细粒度分类任务。我们的最佳结果在15个粗粒度类别上达到了80%的准确率，在279个细粒度类别上达到了60%的准确率，相较于先前报告的最优结果分别提升了8%和28%。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日