LLMs have advanced tool-using agents for real-world applications, yet they often lead to unexpected behaviors or results. Beyond obvious failures, the subtle issue of "intent deviation" severely hinders reliable evaluation and performance improvement. Existing post-training methods generally leverage either real system samples or virtual data simulated by LLMs. However, the former is costly due to reliance on hand-crafted user requests, while the latter suffers from distribution shift from the real tools in the wild. Additionally, both methods lack negative samples tailored to intent deviation scenarios, hindering effective guidance on preference learning. We introduce RISE, a "Real-to-Virtual" method designed to mitigate intent deviation. Anchoring on verified tool primitives, RISE synthesizes virtual trajectories and generates diverse negative samples through mutation on critical parameters. With synthetic data, RISE fine-tunes backbone LLMs via the two-stage training for intent alignment. Evaluation results demonstrate that data synthesized by RISE achieve promising results in eight metrics covering user requires, execution trajectories and agent responses. Integrating with training, RISE achieves an average 35.28% improvement in Acctask (task completion) and 23.27% in Accintent (intent alignment), outperforming SOTA baselines by 1.20--42.09% and 1.17--54.93% respectively.
翻译:大语言模型(LLM)推动了面向现实应用的工具使用智能体发展,但它们常常导致意外行为或结果。除了明显的失败案例外,“意图偏离”这一微妙问题严重阻碍了可靠的评估与性能提升。现有的后训练方法通常利用真实系统样本或由LLM模拟的虚拟数据。然而,前者因依赖人工构建的用户请求而成本高昂,后者则因与现实环境中真实工具的分布偏移而受限。此外,两种方法均缺乏针对意图偏离场景定制的负样本,从而阻碍了偏好学习的有效指导。我们提出RISE,一种“从真实到虚拟”的方法,旨在缓解意图偏离问题。该方法以经过验证的工具原语为基础,通过关键参数变异合成虚拟轨迹并生成多样化的负样本。利用合成数据,RISE通过两阶段训练对骨干LLM进行微调以实现意图对齐。评估结果表明,RISE合成的数据在涵盖用户需求、执行轨迹和智能体响应的八项指标中均取得显著成果。结合训练过程,RISE在任务完成度(Acctask)和意图对齐度(Accintent)上分别实现平均35.28%和23.27%的提升,较现有最优基线分别高出1.20--42.09%和1.17--54.93%。