Leveraging GPT-4 for Food Effect Summarization to Enhance Product-Specific Guidance Development via Iterative Prompting

Food effect summarization from New Drug Application (NDA) is an essential component of product-specific guidance (PSG) development and assessment. However, manual summarization of food effect from extensive drug application review documents is time-consuming, which arouses a need to develop automated methods. Recent advances in large language models (LLMs) such as ChatGPT and GPT-4, have demonstrated great potential in improving the effectiveness of automated text summarization, but its ability regarding the accuracy in summarizing food effect for PSG assessment remains unclear. In this study, we introduce a simple yet effective approach, iterative prompting, which allows one to interact with ChatGPT or GPT-4 more effectively and efficiently through multi-turn interaction. Specifically, we propose a three-turn iterative prompting approach to food effect summarization in which the keyword-focused and length-controlled prompts are respectively provided in consecutive turns to refine the quality of the generated summary. We conduct a series of extensive evaluations, ranging from automated metrics to FDA professionals and even evaluation by GPT-4, on 100 NDA review documents selected over the past five years. We observe that the summary quality is progressively improved throughout the process. Moreover, we find that GPT-4 performs better than ChatGPT, as evaluated by FDA professionals (43% vs. 12%) and GPT-4 (64% vs. 35%). Importantly, all the FDA professionals unanimously rated that 85% of the summaries generated by GPT-4 are factually consistent with the golden reference summary, a finding further supported by GPT-4 rating of 72% consistency. These results strongly suggest a great potential for GPT-4 to draft food effect summaries that could be reviewed by FDA professionals, thereby improving the efficiency of PSG assessment cycle and promoting the generic drug product development.

翻译：从新药申请（NDA）中提炼食物效应总结是产品特定指南（PSG）开发与评估的关键环节。然而，基于庞杂药物申请评审文档的人工食物效应总结耗时费力，亟需开发自动化方法。以ChatGPT和GPT-4为代表的大语言模型（LLMs）在提升自动文本总结效率方面展现出巨大潜力，但其在PSG评估中食物效应总结的准确性仍存疑问。本研究提出一种简单有效的方法——迭代提示，通过多轮交互实现与ChatGPT或GPT-4更高效的互动。具体而言，我们设计了三轮迭代提示策略：在连续交互中分别提供关键词聚焦提示和长度控制提示，逐步优化生成摘要的质量。我们选取过去五年间的100份NDA评审文档，开展了从自动评估指标到FDA专业人员评审及GPT-4自评的全面验证。结果表明，摘要质量随迭代过程逐步提升。此外，FDA专业人员评估（43% vs. 12%）与GPT-4自评（64% vs. 35%）均显示GPT-4表现优于ChatGPT。值得注意的是，所有FDA专业人员一致认为GPT-4生成的85%摘要与黄金标准参考文献事实一致，而GPT-4自评的72%一致性进一步印证了这一发现。这些结果充分表明，GPT-4在起草可经FDA专业人员审阅的食物效应总结方面具有巨大潜力，从而有望提升PSG评估周期效率，促进仿制药产品开发。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日