大型语言模型通过图灵测试 (Large Language Models Pass the Turing Test)

We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time -- not significantly more or less often than the humans they were being compared to -- while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test. The results have implications for debates about what kind of intelligence is exhibited by Large Language Models (LLMs), and the social and economic impacts these systems are likely to have.

翻译：我们在两个随机、受控且预先注册的图灵测试中，对四个系统（ELIZA、GPT-4o、LLaMa-3.1-405B 和 GPT-4.5）在独立人群上进行了评估。参与者同时与另一名人类参与者及其中一个系统进行5分钟的对话，随后判断他们认为哪个对话伙伴是人类。当被提示采用类人角色时，GPT-4.5 被判定为人类的概率为 73%：显著高于询问者选择真实人类参与者的频率。在相同提示下，LLaMa-3.1 被判定为人类的概率为 56%——与其被比较的人类参与者相比，该频率既未显著更高也未显著更低——而基线模型（ELIZA 和 GPT-4o）的胜率则显著低于随机水平（分别为 23% 和 21%）。这些结果首次提供了经验证据，表明有系统通过了标准的三方图灵测试。该结果对于探讨大型语言模型（LLMs）所展现的是何种智能，以及这些系统可能产生的社会和经济影响等争论具有重要意义。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日