Statutory reasoning is the task of reasoning with facts and statutes, which are rules written in natural language by a legislature. It is a basic legal skill. In this paper we explore the capabilities of the most capable GPT-3 model, text-davinci-003, on an established statutory-reasoning dataset called SARA. We consider a variety of approaches, including dynamic few-shot prompting, chain-of-thought prompting, and zero-shot prompting. While we achieve results with GPT-3 that are better than the previous best published results, we also identify several types of clear errors it makes. In investigating why these happen, we discover that GPT-3 has imperfect prior knowledge of the actual U.S. statutes on which SARA is based. More importantly, GPT-3 performs poorly at answering straightforward questions about simple synthetic statutes. By also posing the same questions when the synthetic statutes are written in sentence form, we find that some of GPT-3's poor performance results from difficulty in parsing the typical structure of statutes, containing subsections and paragraphs.
翻译:成文法推理是一项基于事实与成文法进行推理的任务,其中成文法是由立法机构以自然语言编写的规则,属于基本法律技能。本文探究了最强大的GPT-3模型—text-davinci-003—在已建立的成文法推理数据集SARA上的能力。我们采用了多种方法,包括动态少样本提示、思维链提示和零样本提示。尽管GPT-3取得的结果优于先前已发表的最佳结果,但我们仍识别出它犯下的几类明显错误。在探究这些错误成因时,我们发现GPT-3对SARA所依据的美国实际成文法存在不完善的先验知识。更重要的是,GPT-3在回答关于简单合成成文法的直接问题方面表现不佳。通过将同一问题以句子形式表述合成成文法时再次提出,我们发现GPT-3的部分不佳表现源于其难以解析包含分节和段落的典型成文法结构。