Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs). However, it is unclear how to collect and incorporate feedback in a way that is efficient, effective and unbiased, especially for highly subjective human preferences and values. In this paper, we survey existing approaches for learning from human feedback, drawing on 95 papers primarily from the ACL and arXiv repositories.First, we summarise the past, pre-LLM trends for integrating human feedback into language models. Second, we give an overview of present techniques and practices, as well as the motivations for using feedback; conceptual frameworks for defining values and preferences; and how feedback is collected and from whom. Finally, we encourage a better future of feedback learning in LLMs by raising five unresolved conceptual and practical challenges.
翻译:人类反馈正越来越多地被用于引导大语言模型(LLMs)的行为。然而,如何高效、有效且无偏地收集并整合反馈,尤其是针对高度主观的人类偏好与价值观,目前仍不明确。本文基于ACL和arXiv上的95篇论文,系统调研了现有基于人类反馈的学习方法。首先,我们总结了LLM兴起前将人类反馈整合到语言模型中的历史趋势。其次,概述了当前的技术与实践,包括使用反馈的动机、定义价值观与偏好的概念框架,以及反馈的收集方式与收集对象。最后,我们通过提出五个尚未解决的概念与实践挑战,展望了LLM反馈学习的更优未来。