Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction - 专知论文

会员服务 ·

0

Performer · GPT-4 · MoDELS · Prompt · 得分 ·

2023 年 5 月 30 日

Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction

翻译：分析GPT-3.5与GPT-4在语法纠错任务中的性能表现

Steven Coyne,Keisuke Sakaguchi,Diana Galvan-Sosa,Michael Zock,Kentaro Inui

GPT-3 and GPT-4 models are powerful, achieving high performance on a variety of Natural Language Processing tasks. However, there is a relative lack of detailed published analysis of their performance on the task of grammatical error correction (GEC). To address this, we perform experiments testing the capabilities of a GPT-3.5 model (text-davinci-003) and a GPT-4 model (gpt-4-0314) on major GEC benchmarks. We compare the performance of different prompts in both zero-shot and few-shot settings, analyzing intriguing or problematic outputs encountered with different prompt formats. We report the performance of our best prompt on the BEA-2019 and JFLEG datasets, finding that the GPT models can perform well in a sentence-level revision setting, with GPT-4 achieving a new high score on the JFLEG benchmark. Through human evaluation experiments, we compare the GPT models' corrections to source, human reference, and baseline GEC system sentences and observe differences in editing strategies and how they are scored by human raters.

翻译：GPT-3和GPT-4模型功能强大，在多种自然语言处理任务中均展现出卓越性能。然而，关于它们在语法纠错（GEC）任务中表现的具体分析却相对缺乏。为解决这一问题，我们通过实验测试了GPT-3.5模型（text-davinci-003）和GPT-4模型（gpt-4-0314）在主流GEC基准测试中的能力。我们比较了零样本和少样本设置下不同提示词的表现，分析了不同提示词格式中出现的引人关注或存在问题的输出结果。报告了在BEA-2019和JFLEG数据集上最优提示词的表现，发现GPT模型在句子级修订场景中表现优异，其中GPT-4在JFLEG基准测试中创下新高。通过人工评估实验，我们将GPT模型的纠错结果与原文、人工参考及基线GEC系统生成的句子进行了对比，观察到编辑策略差异及其对人类评分员评估结果的影响。

0

相关内容

Performer

LLM in Medical Domain: 大语言模型在医学领域的应用

LLM in Medical Domain: 大语言模型在医学领域的应用

专知会员服务

103+阅读 · 2023年6月17日

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

94+阅读 · 2020年2月12日

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

专知会员服务

80+阅读 · 2019年10月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

TRAF3IP3调控T细胞活性与肿瘤免疫的分子机制

国家自然科学基金

0+阅读 · 2016年12月31日

AEG-1对外周T细胞淋巴瘤恶性生物学行为的影响及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

(Ba,Ca)(Ti,Sn)O3多元体系无铅压电陶瓷的相结构与性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

二维晶体系统的应变和输运性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

Sb、Te掺杂Bi2Se3拓扑绝缘体纳米片的制备及表面态输运性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

DFT+Gutzwiller方法研究过渡金属氧化物

国家自然科学基金

0+阅读 · 2012年12月31日

下一代无源光网络相位噪声机理及抑制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

髓系抑制性细胞（MDSC）参与鼻咽癌免疫耐受的作用和调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

Co基Heusler合金半金属磁性隧道结界面特性及电子极化输运性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

Evaluating GPT-3.5 and GPT-4 on Grammatical Error Correction for Brazilian Portuguese

Arxiv

0+阅读 · 2023年7月18日

Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and Addressing Sociological Implications

Arxiv

0+阅读 · 2023年7月18日

A Survey on Evaluation of Large Language Models

Arxiv

0+阅读 · 2023年7月18日

Emotional Intelligence of Large Language Models

Arxiv

0+阅读 · 2023年7月18日

How is ChatGPT's behavior changing over time?

Arxiv

0+阅读 · 2023年7月18日

Extending the Frontier of ChatGPT: Code Generation and Debugging

Arxiv

0+阅读 · 2023年7月17日

Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

Arxiv

0+阅读 · 2023年7月16日

The Potential and Pitfalls of using a Large Language Model such as ChatGPT or GPT-4 as a Clinical Assistant

Arxiv

0+阅读 · 2023年7月16日

Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining Datasets

Arxiv

0+阅读 · 2023年7月15日

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Arxiv

12+阅读 · 2023年4月26日

VIP会员

文章信息

相关主题

最新内容

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

专知会员服务

3+阅读 · 今天5:21

《决策模型比较研究》

《决策模型比较研究》

专知会员服务

8+阅读 · 今天5:16

全球军事与武器工业中的人工智能：应用、方法与影响（万字长文）

全球军事与武器工业中的人工智能：应用、方法与影响（万字长文）

专知会员服务

4+阅读 · 今天4:37

《美军水下战与海床战概述及本地实施》

《美军水下战与海床战概述及本地实施》

专知会员服务

4+阅读 · 今天4:30

面向未来冲突推进陆军情报体制改革

面向未来冲突推进陆军情报体制改革

专知会员服务

4+阅读 · 今天4:12

人工智能赋能无人机：俄乌冲突案例及其深远影响（万字长文）

人工智能赋能无人机：俄乌冲突案例及其深远影响（万字长文）

专知会员服务

5+阅读 · 今天2:54

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

专知会员服务

9+阅读 · 7月24日

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

专知会员服务

8+阅读 · 7月24日

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

专知会员服务

5+阅读 · 7月24日

乌克兰纵深打击如何重塑俄罗斯的战略选择

乌克兰纵深打击如何重塑俄罗斯的战略选择

专知会员服务

3+阅读 · 7月24日

《分布式太空任务对比分析与综合建模及仿真环境》120页

《分布式太空任务对比分析与综合建模及仿真环境》120页

专知会员服务

4+阅读 · 7月24日

俄乌战争中关于中程打击无人机部署的经验启示

俄乌战争中关于中程打击无人机部署的经验启示

专知会员服务

5+阅读 · 7月24日

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

7+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

6+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

9+阅读 · 7月23日

相关VIP内容

LLM in Medical Domain: 大语言模型在医学领域的应用

LLM in Medical Domain: 大语言模型在医学领域的应用

专知会员服务

103+阅读 · 2023年6月17日

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

94+阅读 · 2020年2月12日

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

专知会员服务

80+阅读 · 2019年10月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《决策模型比较研究》

《美军水下战与海床战概述及本地实施》

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

全球军事与武器工业中的人工智能：应用、方法与影响（万字长文）

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

Evaluating GPT-3.5 and GPT-4 on Grammatical Error Correction for Brazilian Portuguese

Arxiv

0+阅读 · 2023年7月18日

Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and Addressing Sociological Implications

Arxiv

0+阅读 · 2023年7月18日

A Survey on Evaluation of Large Language Models

Arxiv

0+阅读 · 2023年7月18日

Emotional Intelligence of Large Language Models

Arxiv

0+阅读 · 2023年7月18日

How is ChatGPT's behavior changing over time?

Arxiv

0+阅读 · 2023年7月18日

Extending the Frontier of ChatGPT: Code Generation and Debugging

Arxiv

0+阅读 · 2023年7月17日

Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

Arxiv

0+阅读 · 2023年7月16日

The Potential and Pitfalls of using a Large Language Model such as ChatGPT or GPT-4 as a Clinical Assistant

Arxiv

0+阅读 · 2023年7月16日

Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining Datasets

Arxiv

0+阅读 · 2023年7月15日

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Arxiv

12+阅读 · 2023年4月26日

相关基金

TRAF3IP3调控T细胞活性与肿瘤免疫的分子机制

国家自然科学基金

0+阅读 · 2016年12月31日

AEG-1对外周T细胞淋巴瘤恶性生物学行为的影响及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

(Ba,Ca)(Ti,Sn)O3多元体系无铅压电陶瓷的相结构与性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

二维晶体系统的应变和输运性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

Sb、Te掺杂Bi2Se3拓扑绝缘体纳米片的制备及表面态输运性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

DFT+Gutzwiller方法研究过渡金属氧化物

国家自然科学基金

0+阅读 · 2012年12月31日

下一代无源光网络相位噪声机理及抑制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

髓系抑制性细胞（MDSC）参与鼻咽癌免疫耐受的作用和调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

Co基Heusler合金半金属磁性隧道结界面特性及电子极化输运性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员