Natural language (NL) feedback offers rich insights into user experience. While existing studies focus on an instance-level approach, where feedback is used to refine specific examples, we introduce a framework for system-level use of NL feedback. We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process -- in order to produce better models. In particular this is done through: (i) metric design for tasks; and (ii) language model prompt design for refining model responses. We conduct two case studies of this approach for improving search query and dialog response generation, demonstrating the effectiveness of system-level feedback. We show the combination of system-level and instance-level feedback brings further gains, and that human written instance-level feedback results in more grounded refinements than GPT-3.5 written ones, underlying the importance of human feedback for building systems. We release our code and data at https://github.com/yyy-Apple/Sys-NL-Feedback.
翻译:自然语言反馈为理解用户体验提供了丰富洞见。现有研究主要关注实例级方法(即利用反馈优化特定示例),而我们提出了一种系统级应用自然语言反馈的框架。我们展示了如何通过人机协同流程,将反馈转化为系统级设计决策的形式化依据——以构建更优模型。具体实现途径包括:(i)任务评估指标设计;以及(ii)语言模型提示设计以优化模型响应。我们通过两个案例研究(搜索查询优化与对话响应生成)验证了该系统级反馈的有效性。研究表明:系统级与实例级反馈相结合能带来进一步性能提升;相较于GPT-3.5生成的实例级反馈,人工撰写的实例级反馈能产生更扎实的优化依据,这凸显了人类反馈对系统构建的关键作用。相关代码与数据已开源发布至https://github.com/yyy-Apple/Sys-NL-Feedback。