We propose a large language model based reward decomposition framework for aligning dialogue agents using only a single session-level feedback signal. We leverage the reasoning capabilities of a frozen, pretrained large language model (LLM) to infer fine-grained local implicit rewards by decomposing global, session-level feedback. Our first \emph{text-only} variant prompts the LLM to perform reward decomposition using only the dialogue transcript. The second \emph{multimodal} variant incorporates additional behavioral cues, such as pitch, gaze, and facial affect, expressed as natural language descriptions. These inferred turn-level rewards are distilled into a lightweight reward model, which we utilize for RL-based fine-tuning for dialogue generation. We evaluate both text-only and multimodal variants against state-of-the-art reward decomposition methods and demonstrate notable improvements in human evaluations of conversation quality, suggesting that LLMs are strong reward decomposers that obviate the need for manual reward shaping and granular human feedback.
翻译:本文提出一种基于大语言模型的奖励分解框架,仅需会话级反馈信号即可实现对话智能体的对齐。我们利用冻结的预训练大语言模型(LLM)的推理能力,通过分解全局会话级反馈来推断细粒度的局部隐式奖励。首个\textbf{纯文本}变体仅使用对话文本提示LLM执行奖励分解;第二个\textbf{多模态}变体则融合了音高、注视方向、面部情感等行为线索的自然语言描述。这些推断的轮次级奖励被蒸馏至轻量级奖励模型,进而用于基于强化学习的对话生成微调。实验表明,两种变体在对话质量的人工评估中均显著优于现有奖励分解方法,证明大语言模型作为强大的奖励分解器,能够有效避免人工奖励工程设计与细粒度人类反馈的需求。