GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam

The Plastic Surgery In-Service Training Exam (PSITE) is an important indicator of resident proficiency and serves as a useful benchmark for evaluating OpenAI's GPT. Unlike many of the simulated tests or practice questions shown in the GPT-4 Technical Paper, the multiple-choice questions evaluated here are authentic PSITE questions. These questions offer realistic clinical vignettes that a plastic surgeon commonly encounters in practice and scores highly correlate with passing the written boards required to become a Board Certified Plastic Surgeon. Our evaluation shows dramatic improvement of GPT-4 (without vision) over GPT-3.5 with both the 2022 and 2021 exams respectively increasing the score from 8th to 88th percentile and 3rd to 99th percentile. The final results of the 2023 PSITE are set to be released on April 11, 2023, and this is an exciting moment to continue our research with a fresh exam. Our evaluation pipeline is ready for the moment that the exam is released so long as we have access via OpenAI to the GPT-4 API. With multimodal input, we may achieve superhuman performance on the 2023.

翻译：《整形外科在职培训考试》（PSITE）是评估住院医师专业能力的重要指标，也是检验OpenAI GPT模型性能的有效基准。与GPT-4技术报告中展示的模拟测试或练习题不同，本研究评估的均为真实PSITE多选题。这些题目呈现了整形外科医生日常实践中常见的临床情景，其得分与通过整形外科专科医师认证所需笔试成绩高度相关。我们的评估显示，GPT-4（不含视觉模块）相比GPT-3.5取得显著提升：2022年与2021年考试中，得分分别从第8百分位跃升至第88百分位，从第3百分位跃升至第99百分位。2023年PSITE最终成绩将于2023年4月11日公布，这为我们使用全新试题继续研究提供了激动人心的契机。只要通过OpenAI获得GPT-4 API权限，我们的评估流程即可在考试发布后立即启动。借助多模态输入，我们有望在2023年考试中实现超越人类水平的性能表现。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日