Statutory reasoning is the task of reasoning with facts and statutes, which are rules written in natural language by a legislature. It is a basic legal skill. In this paper we explore the capabilities of the most capable GPT-3 model, text-davinci-003, on an established statutory-reasoning dataset called SARA. We consider a variety of approaches, including dynamic few-shot prompting, chain-of-thought prompting, and zero-shot prompting. While we achieve results with GPT-3 that are better than the previous best published results, we also identify several types of clear errors it makes. We investigate why these errors happen. We discover that GPT-3 has imperfect prior knowledge of the actual U.S. statutes on which SARA is based. More importantly, we create simple synthetic statutes, which GPT-3 is guaranteed not to have seen during training. We find GPT-3 performs poorly at answering straightforward questions about these simple synthetic statutes.
翻译:成文法推理是一种基于事实和成文法(由立法机关以自然语言编写的规则)进行推理的任务,属于基本法律技能。本文探讨了最强大的GPT-3模型(text-davinci-003)在现有成文法推理数据集SARA上的能力。我们采用了多种方法,包括动态少样本提示、思维链提示以及零样本提示。尽管使用GPT-3获得的结果优于此前已发布的最佳成果,但我们仍识别出其产生的若干类明显错误。我们探究了这些错误产生的原因,发现GPT-3对SARA所依据的美国实际成文法存在不完善的前置知识。更重要的是,我们创建了GPT-3在训练过程中不可能接触到的简单合成成文法,发现其对涉及这些简单合成成文法的直接问题表现欠佳。