As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs. In this work, we introduce Shepherd, a language model specifically tuned to critique responses and suggest refinements, extending beyond the capabilities of an untuned model to identify diverse errors and provide suggestions to remedy them. At the core of our approach is a high quality feedback dataset, which we curate from community feedback and human annotations. Even though Shepherd is small (7B parameters), its critiques are either equivalent or preferred to those from established models including ChatGPT. Using GPT-4 for evaluation, Shepherd reaches an average win-rate of 53-87% compared to competitive alternatives. In human evaluation, Shepherd strictly outperforms other models and on average closely ties with ChatGPT.
翻译:随着大型语言模型的进步,利用其自身能力优化输出结果的技术日益受到关注。本文提出Shepherd,一种专门用于批评响应并提出改进建议的语言模型,其能力超越未调优模型,能识别各类错误并提供修正方案。我们方法的核心是一个高质量反馈数据集,该数据集来源于社区反馈和人工标注。尽管Shepherd规模较小(7B参数),其批评质量可与包括ChatGPT在内的成熟模型相媲美甚至更优。通过GPT-4进行评估,Shepherd与竞争性替代方案相比,平均胜率达到53%-87%。在人工评估中,Shepherd严格优于其他模型,且平均表现与ChatGPT接近持平。