Many recent advances in natural language generation have been fueled by training large language models on internet-scale data. However, this paradigm can lead to models that generate toxic, inaccurate, and unhelpful content, and automatic evaluation metrics often fail to identify these behaviors. As models become more capable, human feedback is an invaluable signal for evaluating and improving models. This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation. First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization. Next, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models. We also discuss existing datasets for human-feedback data collection, and concerns surrounding feedback collection. Finally, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for human intervention.
翻译:近年来,自然语言生成的诸多进展得益于在大规模互联网数据上训练大型语言模型。然而,这种范式可能导致模型生成有害、不准确且无益的内容,而自动评估指标往往难以识别这些行为。随着模型能力不断增强,人类反馈成为评估和改进模型的关键信号。本综述旨在梳理近期利用人类反馈改进自然语言生成的研究。首先,我们提出一个涵盖性反馈形式化定义,并据此对现有研究进行分类与组织。其次,探讨反馈的格式与目标如何描述,并阐述两种反馈应用方法(用于训练或解码):直接使用反馈或训练反馈模型。我们还讨论了用于收集人类反馈的现有数据集及相关问题。最后,概述新兴的AI反馈领域——该领域利用大型语言模型基于一组原则进行判断,从而最大限度减少人工干预需求。