Large LMs such as GPT-3 are powerful, but can commit mistakes that are obvious to humans. For example, GPT-3 would mistakenly interpret "What word is similar to good?" to mean a homophone, while the user intended a synonym. Our goal is to effectively correct such errors via user interactions with the system but without retraining, which will be prohibitively costly. We pair GPT-3 with a growing memory of recorded cases where the model misunderstood the user's intents, along with user feedback for clarification. Such a memory allows our system to produce enhanced prompts for any new query based on the user feedback for error correction on similar cases in the past. On four tasks (two lexical tasks, two advanced ethical reasoning tasks), we show how a (simulated) user can interactively teach a deployed GPT-3, substantially increasing its accuracy over the queries with different kinds of misunderstandings by the GPT-3. Our approach is a step towards the low-cost utility enhancement for very large pre-trained LMs. Code, data, and instructions to implement MEMPROMPT for a new task at https://www.memprompt.com/.
翻译:大型语言模型(如GPT-3)功能强大,但会犯人类显而易见的错误。例如,GPT-3会错误地将“与‘good’相似的词是什么?”理解为同音词,而用户本意是近义词。我们的目标是通过用户与系统的交互有效纠正此类错误,且无需重新训练(这将带来高昂成本)。我们将GPT-3与一个持续增长的记忆模块配对,该模块记录模型误解用户意图的历史案例及相应澄清反馈。这种记忆机制使得系统能够基于过去相似案例中用于纠错的用户反馈,为任何新查询生成增强提示。在四项任务(两项词汇任务、两项高级伦理推理任务)中,我们展示了(模拟)用户如何以交互方式教导已部署的GPT-3,显著提升其对不同类型误解查询的准确率。我们的方法为极大规模预训练语言模型的低成本效能增强迈出了一步。实现MEMPROMPT新任务的代码、数据及操作指南参见 https://www.memprompt.com/。