This paper shows that alignment methods can achieve superior adherence to guardrails compared to instruction fine-tuning alone in conversational agents, also known as bots, within predefined guidelines or 'guardrails'. It examines traditional training approaches such as instruction fine-tuning and the recent advancements in direct alignment methods like Identity Preference Optimization (IPO), and Kahneman-Tversky Optimization (KTO). The effectiveness of alignment techniques both pre and post-instruction tuning is highlighted, illustrating their potential to optimize conversational bots in domains that require strict adherence to specified rules, such as customer care.
翻译:本文研究表明,在遵循预定义准则或"护栏"约束的对话代理(亦称机器人)中,对齐方法相较于单纯的指令微调能实现更优越的规则遵循能力。研究探讨了指令微调等传统训练方法,以及近期直接对齐方法的进展,如身份偏好优化(IPO)和卡尼曼-特沃斯基优化(KTO)。本文重点阐明了指令微调前后对齐技术的有效性,展示了这些技术在需要严格遵循特定规则领域(如客户服务)中优化对话机器人的潜力。