Caveat Lector: Large Language Models in Legal Practice

The current fascination with large language models, or LLMs, derives from the fact that many users lack the expertise to evaluate the quality of the generated text. LLMs may therefore appear more capable than they actually are. The dangerous combination of fluency and superficial plausibility leads to the temptation to trust the generated text and creates the risk of overreliance. Who would not trust perfect legalese? Relying recent findings in both technical and legal scholarship, this Article counterbalances the overly optimistic predictions as to the role of LLMs in legal practice. Integrating LLMs into legal workstreams without a better comprehension of their limitations, will create inefficiencies if not outright risks. Notwithstanding their unprecedented ability to generate text, LLMs do not understand text. Without the ability to understand meaning, LLMs will remain unable to use language, to acquire knowledge and to perform complex reasoning tasks. Trained to model language on the basis of stochastic word predictions, LLMs cannot distinguish fact from fiction. Their knowledge of the law is limited to word strings memorized in their parameters. It is also incomplete and largely incorrect. LLMs operate at the level of word distributions, not at the level of verified facts. The resulting propensity to hallucinate, to produce statements that are incorrect but appear helpful and relevant, is alarming in high-risk areas like legal services. At present, lawyers should beware of relying on text generated by LLMs.

翻译：当前对大型语言模型（LLMs）的热衷源于许多用户缺乏评估生成文本质量的专长。LLMs可能因此显得比实际能力更强大。流畅性与表面合理性的危险结合，诱使人们信任生成文本，并带来过度依赖的风险。谁不会信任完美的法律术语？基于技术和法律学术领域的最新发现，本文平衡了关于LLMs在法律实践中作用的过度乐观预测。若未能充分理解其局限性便将LLMs整合到法律工作流中，将不仅导致低效，更可能带来直接风险。尽管LLMs具备前所未有的文本生成能力，但它们并不理解文本。由于缺乏理解含义的能力，LLMs始终无法运用语言、获取知识或执行复杂推理任务。LLMs基于随机词预测训练语言模型，无法区分事实与虚构。它们的法律知识仅限于参数中记忆的词串，且内容既不完全又大多错误。LLMs在词分布层面运作，而非基于已验证的事实。由此产生的幻觉倾向——生成看似有用且相关但实际错误的陈述——在法律服务等高风险领域令人担忧。目前，律师应警惕依赖LLMs生成的文本。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日