The Plastic Surgery In-Service Training Exam (PSITE) is an important indicator of resident proficiency and serves as a useful benchmark for evaluating OpenAI's GPT. Unlike many of the simulated tests or practice questions shown in the GPT-4 Technical Paper, the multiple-choice questions evaluated here are authentic PSITE questions. These questions offer realistic clinical vignettes that a plastic surgeon commonly encounters in practice and scores highly correlate with passing the written boards required to become a Board Certified Plastic Surgeon. Our evaluation shows dramatic improvement of GPT-4 (without vision) over GPT-3.5 with both the 2022 and 2021 exams respectively increasing the score from 8th to 88th percentile and 3rd to 99th percentile. The final results of the 2023 PSITE are set to be released on April 11, 2023, and this is an exciting moment to continue our research with a fresh exam. Our evaluation pipeline is ready for the moment that the exam is released so long as we have access via OpenAI to the GPT-4 API. With multimodal input, we may achieve superhuman performance on the 2023.
翻译:《整形外科在职培训考试》(PSITE)是评估住院医师专业能力的重要指标,也是检验OpenAI GPT模型性能的有效基准。与GPT-4技术报告中展示的模拟测试或练习题不同,本研究评估的均为真实PSITE多选题。这些题目呈现了整形外科医生日常实践中常见的临床情景,其得分与通过整形外科专科医师认证所需笔试成绩高度相关。我们的评估显示,GPT-4(不含视觉模块)相比GPT-3.5取得显著提升:2022年与2021年考试中,得分分别从第8百分位跃升至第88百分位,从第3百分位跃升至第99百分位。2023年PSITE最终成绩将于2023年4月11日公布,这为我们使用全新试题继续研究提供了激动人心的契机。只要通过OpenAI获得GPT-4 API权限,我们的评估流程即可在考试发布后立即启动。借助多模态输入,我们有望在2023年考试中实现超越人类水平的性能表现。