Better understanding of Large Language Models' (LLMs) legal analysis abilities can contribute to improving the efficiency of legal services, governing artificial intelligence, and leveraging LLMs to identify inconsistencies in law. This paper explores LLM capabilities in applying tax law. We choose this area of law because it has a structure that allows us to set up automated validation pipelines across thousands of examples, requires logical reasoning and maths skills, and enables us to test LLM capabilities in a manner relevant to real-world economic lives of citizens and companies. Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release. We experiment with retrieving and utilising the relevant legal authority to assess the impact of providing additional legal context to LLMs. Few-shot prompting, presenting examples of question-answer pairs, is also found to significantly enhance the performance of the most advanced model, GPT-4. The findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels. As LLMs continue to advance, their ability to reason about law autonomously could have significant implications for the legal profession and AI governance.
翻译:更深入地理解大型语言模型(LLMs)的法律分析能力,有助于提升法律服务的效率、规范人工智能治理,并利用LLMs识别法律中的不一致之处。本文探讨了LLM在税法应用方面的能力。我们选择这一法律领域,是因为其结构允许我们建立自动验证流程以处理数千个案例,同时需要逻辑推理与数学能力,并能以与公民和公司现实经济生活相关的方式测试LLM能力。我们的实验表明,随着OpenAI后续每个模型版本的发布,LLM在法律理解能力方面展现出涌现特性,性能持续提升。我们尝试检索并利用相关法律权威来评估向LLM提供额外法律背景的影响。研究还发现,少样本提示(即提供问答对示例)能显著增强最先进模型GPT-4的性能。研究结果表明,结合提示优化与正确法律文本的LLM,能够达到较高的准确率,但尚未达到专业税务律师的水平。随着LLM持续发展,其自主推理法律问题的能力可能对法律行业与人工智能治理产生深远影响。