Natural language (NL) feedback contains rich information about the user experience. Existing studies focus on an instance-level approach, where feedback is used to refine specific examples, disregarding its system-wide application. This paper proposes a general framework for unlocking the system-level use of NL feedback. We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process -- in order to produce better models. In particular this is done through: (i) metric design for tasks; and (ii) language model prompt design for refining model responses. We conduct two case studies of this approach for improving search query generation and dialog response generation, demonstrating the effectiveness of the use of system-level feedback. We show the combination of system-level feedback and instance-level feedback brings further gains, and that human written instance-level feedback results in more grounded refinements than GPT-3.5 written ones, underlying the importance of human feedback for building systems.
翻译:自然语言(NL)反馈蕴含有关用户体验的丰富信息。现有研究侧重于实例级方法,即利用反馈来优化具体示例,而忽视了其在系统层面的应用。本文提出一个通用框架,以解锁自然语言反馈的系统级应用。我们展示了如何利用反馈,在人机协同流程中形式化系统级设计决策,从而生成更优的模型。具体通过以下两种方式实现:(i)任务指标设计;及(ii)语言模型提示设计以优化模型响应。我们通过两项案例研究(改进搜索查询生成与对话响应生成)验证了该方法在系统级反馈应用中的有效性。研究结果表明,系统级反馈与实例级反馈的结合可带来进一步提升;同时,人工撰写的实例级反馈相比GPT-3.5生成的反馈能产生更可靠的优化方向,这凸显了人类反馈在系统构建中的关键作用。