Customizing LLMs for a specific task involves separating high-quality responses from lower-quality ones. This skill can be developed using supervised fine-tuning with extensive human preference data. However, obtaining a large volume of expert-annotated data is costly for most tasks. In this paper, we explore a novel method to optimize LLMs using ranking metrics. This method trains the model to prioritize the best responses from a pool of candidates created for a particular task. Rather than a traditional full ordering, we advocate for a partial ordering, as achieving consensus on the perfect order of candidate responses can be challenging. Our partial ordering is more robust, less sensitive to noise, and can be achieved with limited human annotations or through heuristic methods. We test our system's improved response generation ability using benchmark datasets, including textual entailment and multi-document question answering. We conduct ablation studies to understand crucial factors, such as how to gather candidate responses for a specific task, determine their most suitable order, and balance supervised fine-tuning with ranking metrics. Our approach, named Rescue, offers a promising avenue for enhancing the response generation and task accuracy of LLMs.
翻译:针对特定任务定制大语言模型需要区分高质量响应与低质量响应。这项能力可通过使用大规模人类偏好数据进行监督微调来培养。然而,对于大多数任务而言,获取大量专家标注数据的成本高昂。本文探索了一种利用排序指标优化大语言模型的新方法。该方法训练模型从针对特定任务生成的候选响应池中优先选择最佳响应。相较于传统的全序排序,我们主张采用偏序排序,因为就候选响应的完美顺序达成共识往往具有挑战性。我们的偏序排序方法更具鲁棒性,对噪声较不敏感,且可通过有限的人工标注或启发式方法实现。我们使用文本蕴含和多文档问答等基准数据集测试了系统改进的响应生成能力。通过消融研究,我们分析了关键影响因素,包括如何收集特定任务的候选响应、确定其最合适的排序方式,以及平衡监督微调与排序指标。我们提出的Rescue方法为提升大语言模型的响应生成能力和任务准确性提供了可行路径。