FlowSteer：基于端到端强化学习的交互式智能体工作流编排 (FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning) - 专知论文

会员服务 ·

0

交互 · 端到端 · 智能体 · 强化学习 · 稀疏 ·

FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning

翻译：FlowSteer：基于端到端强化学习的交互式智能体工作流编排

Mingda Zhang,Haoran Luo,Tiesunlong Shen,Qika Lin,Xiaoying Tang,Rui Mao,Erik Cambria

from arxiv, 41 pages, 7 figures, 6 tables. Project page: http://flowsteer.org/

In recent years, a variety of powerful agentic workflows have been applied to solve a wide range of human problems. However, existing workflow orchestration still faces key challenges, including high manual cost, reliance on specific operators/large language models (LLMs), and sparse reward signals. To address these challenges, we propose FlowSteer, an end-to-end reinforcement learning framework that takes a lightweight policy model as the agent and an executable canvas environment, automating workflow orchestration through multi-turn interaction. In this process, the policy model analyzes execution states and selects editing actions, while the canvas executes operators and returns feedback for iterative refinement. Moreover, FlowSteer provides a plug-and-play framework that supports diverse operator libraries and interchangeable LLM backends. To effectively train this interaction paradigm, we propose Canvas Workflow Relative Policy Optimization (CWRPO), which introduces diversity-constrained rewards with conditional release to stabilize learning and suppress shortcut behaviors. Experimental results on twelve datasets show that FlowSteer significantly outperforms baselines across various tasks.

翻译：近年来，各类强大的智能体工作流已被应用于解决广泛的人类问题。然而，现有工作流编排仍面临关键挑战，包括高昂的人工成本、对特定算子/大语言模型（LLMs）的依赖，以及稀疏的奖励信号。为应对这些挑战，我们提出FlowSteer——一种端到端的强化学习框架，该框架以轻量级策略模型作为智能体，并构建可执行的画布环境，通过多轮交互实现工作流编排的自动化。在此过程中，策略模型分析执行状态并选择编辑动作，而画布则执行算子并返回反馈以供迭代优化。此外，FlowSteer提供了一个即插即用的框架，支持多样化的算子库与可互换的LLM后端。为有效训练此交互范式，我们提出了画布工作流相对策略优化（CWRPO），该方法引入具有条件释放的多样性约束奖励，以稳定学习过程并抑制捷径行为。在十二个数据集上的实验结果表明，FlowSteer在多种任务上均显著优于基线方法。

0

相关内容

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

专知会员服务

30+阅读 · 2025年12月28日

【ICCV2025】FlowSeek：借助深度基础模型与运动基实现更简易的光流估计

【ICCV2025】FlowSeek：借助深度基础模型与运动基实现更简易的光流估计

专知会员服务

9+阅读 · 2025年9月8日

AI智能体编程：技术、挑战与机遇综述

AI智能体编程：技术、挑战与机遇综述

专知会员服务

41+阅读 · 2025年8月18日

【ICML2025】MetaAgent：基于有限状态机的多智能体系统自动构建方法

【ICML2025】MetaAgent：基于有限状态机的多智能体系统自动构建方法

专知会员服务

15+阅读 · 2025年7月31日

【斯坦福大学博士论文】构建大语言模型的交互式学习流程管线

【斯坦福大学博士论文】构建大语言模型的交互式学习流程管线

专知会员服务

21+阅读 · 2025年6月13日

上交大推出首个AI智能体协议全面综述：从碎片化到互联互通的智能体网络

上交大推出首个AI智能体协议全面综述：从碎片化到互联互通的智能体网络

专知会员服务

25+阅读 · 2025年4月30日

走向通用虚拟智能体

走向通用虚拟智能体

专知会员服务

74+阅读 · 2023年11月26日

【普林斯顿】基于大型语言模型的语言智能体认知架构

【普林斯顿】基于大型语言模型的语言智能体认知架构

专知会员服务

77+阅读 · 2023年9月6日

TensorFlowLite:端侧机器学习框架

TensorFlowLite:端侧机器学习框架

专知会员服务

33+阅读 · 2020年8月27日

【O'Reilly TensorFlow Conference 2019】HARP：高效的GPU共享系统（HARP: An efficient and elastic GPU-sharing system），Alibaba | Pengfei Fan，Lingling Jin

【O'Reilly TensorFlow Conference 2019】HARP：高效的GPU共享系统（HARP: An efficient and elastic GPU-sharing system），Alibaba | Pengfei Fan，Lingling Jin

专知会员服务

10+阅读 · 2019年11月13日

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

专知

14+阅读 · 2022年10月15日

【加州理工】什么是模仿学习(Imitation Learning（模仿学习), 这62页ppt带你了解进展，附下载

【加州理工】什么是模仿学习(Imitation Learning（模仿学习), 这62页ppt带你了解进展，附下载

专知

21+阅读 · 2019年11月14日

【泡泡图灵智库】FlowNet3D:在三维点云中学习场景流（CVPR）

【泡泡图灵智库】FlowNet3D:在三维点云中学习场景流（CVPR）

泡泡机器人SLAM

13+阅读 · 2019年6月13日

GitHub趋势榜第一：TensorFlow+PyTorch深度学习资源大汇总

GitHub趋势榜第一：TensorFlow+PyTorch深度学习资源大汇总

新智元

19+阅读 · 2019年6月8日

【泡泡点云时空】FlowNet3D：学习三维点云中的场景流

【泡泡点云时空】FlowNet3D：学习三维点云中的场景流

泡泡机器人SLAM

41+阅读 · 2019年5月19日

DeepMind综述深度强化学习中的快与慢，智能体应该像人一样学习

DeepMind综述深度强化学习中的快与慢，智能体应该像人一样学习

机器之心

20+阅读 · 2019年5月3日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

54+阅读 · 2019年4月12日

Facebook何恺明团队提出SlowFast网络，视频识别无需预训练

Facebook何恺明团队提出SlowFast网络，视频识别无需预训练

AI前线

10+阅读 · 2018年12月23日

tensorflow系列笔记：流程，概念和代码解析

tensorflow系列笔记：流程，概念和代码解析

北京思腾合力科技有限公司

30+阅读 · 2017年11月11日

【强化学习】强化学习+深度学习=人工智能

【强化学习】强化学习+深度学习=人工智能

产业智能官

55+阅读 · 2017年8月11日

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

42+阅读 · 2015年12月31日

非结构环境下基于三维肢体动作理解的工业机器人交互技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向网络社会的工作流关键技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

自适应快速模拟细节丰富的流体技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

柔性工序选择的混合流水车间调度及其离散群智能算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

云环境中支持混合并行模式的科学工作流的执行优化

国家自然科学基金

0+阅读 · 2014年12月31日

压电智能作动器的高保真完整非线性动力学建模和高精度多通道运动协同同步控制系统一体化优化设计

国家自然科学基金

0+阅读 · 2014年12月31日

基于群体智能的多Agent协作模型与适应性研究

国家自然科学基金

18+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

Difficulty-Aware Agentic Orchestration for Query-Specific Multi-Agent Workflows

Arxiv

0+阅读 · 2月13日

FlowMind: Execute-Summarize for Structured Workflow Generation from LLM Reasoning

Arxiv

0+阅读 · 2月12日

Learning to Compose for Cross-domain Agentic Workflow Generation

Arxiv

0+阅读 · 2月11日

AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction

Arxiv

0+阅读 · 2月6日

Internet of Agentic AI: Incentive-Compatible Distributed Teaming and Workflow

Arxiv

0+阅读 · 2月3日

Constrained Process Maps for Multi-Agent Generative AI Workflows

Arxiv

0+阅读 · 2月2日

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Arxiv

0+阅读 · 1月30日

Optimizing Agentic Workflows using Meta-tools

Arxiv

0+阅读 · 1月29日

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

Arxiv

0+阅读 · 1月28日

Batch Query Processing and Optimization for Agentic Workflows

Arxiv

0+阅读 · 1月19日

VIP会员

文章信息

相关主题

相关VIP内容

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

专知会员服务

30+阅读 · 2025年12月28日

【ICCV2025】FlowSeek：借助深度基础模型与运动基实现更简易的光流估计

【ICCV2025】FlowSeek：借助深度基础模型与运动基实现更简易的光流估计

专知会员服务

9+阅读 · 2025年9月8日

AI智能体编程：技术、挑战与机遇综述

AI智能体编程：技术、挑战与机遇综述

专知会员服务

41+阅读 · 2025年8月18日

【ICML2025】MetaAgent：基于有限状态机的多智能体系统自动构建方法

【ICML2025】MetaAgent：基于有限状态机的多智能体系统自动构建方法

专知会员服务

15+阅读 · 2025年7月31日

【斯坦福大学博士论文】构建大语言模型的交互式学习流程管线

【斯坦福大学博士论文】构建大语言模型的交互式学习流程管线

专知会员服务

21+阅读 · 2025年6月13日

上交大推出首个AI智能体协议全面综述：从碎片化到互联互通的智能体网络

上交大推出首个AI智能体协议全面综述：从碎片化到互联互通的智能体网络

专知会员服务

25+阅读 · 2025年4月30日

走向通用虚拟智能体

走向通用虚拟智能体

专知会员服务

74+阅读 · 2023年11月26日

【普林斯顿】基于大型语言模型的语言智能体认知架构

【普林斯顿】基于大型语言模型的语言智能体认知架构

专知会员服务

77+阅读 · 2023年9月6日

TensorFlowLite:端侧机器学习框架

TensorFlowLite:端侧机器学习框架

专知会员服务

33+阅读 · 2020年8月27日

【O'Reilly TensorFlow Conference 2019】HARP：高效的GPU共享系统（HARP: An efficient and elastic GPU-sharing system），Alibaba | Pengfei Fan，Lingling Jin

【O'Reilly TensorFlow Conference 2019】HARP：高效的GPU共享系统（HARP: An efficient and elastic GPU-sharing system），Alibaba | Pengfei Fan，Lingling Jin

专知会员服务

10+阅读 · 2019年11月13日

热门VIP内容

开通专知VIP会员享更多权益服务

《可信人工智能赋能系统的支柱》

《从经典神经网络到不确定性下的拓扑神经网络：军事应用》2026最新40页报告

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

《人工智能：对战略与力量的影响》slides

相关资讯

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

专知

14+阅读 · 2022年10月15日

【加州理工】什么是模仿学习(Imitation Learning（模仿学习), 这62页ppt带你了解进展，附下载

【加州理工】什么是模仿学习(Imitation Learning（模仿学习), 这62页ppt带你了解进展，附下载

专知

21+阅读 · 2019年11月14日

【泡泡图灵智库】FlowNet3D:在三维点云中学习场景流（CVPR）

【泡泡图灵智库】FlowNet3D:在三维点云中学习场景流（CVPR）

泡泡机器人SLAM

13+阅读 · 2019年6月13日

GitHub趋势榜第一：TensorFlow+PyTorch深度学习资源大汇总

GitHub趋势榜第一：TensorFlow+PyTorch深度学习资源大汇总

新智元

19+阅读 · 2019年6月8日

【泡泡点云时空】FlowNet3D：学习三维点云中的场景流

【泡泡点云时空】FlowNet3D：学习三维点云中的场景流

泡泡机器人SLAM

41+阅读 · 2019年5月19日

DeepMind综述深度强化学习中的快与慢，智能体应该像人一样学习

DeepMind综述深度强化学习中的快与慢，智能体应该像人一样学习

机器之心

20+阅读 · 2019年5月3日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

54+阅读 · 2019年4月12日

Facebook何恺明团队提出SlowFast网络，视频识别无需预训练

Facebook何恺明团队提出SlowFast网络，视频识别无需预训练

AI前线

10+阅读 · 2018年12月23日

tensorflow系列笔记：流程，概念和代码解析

tensorflow系列笔记：流程，概念和代码解析

北京思腾合力科技有限公司

30+阅读 · 2017年11月11日

【强化学习】强化学习+深度学习=人工智能

【强化学习】强化学习+深度学习=人工智能

产业智能官

55+阅读 · 2017年8月11日

相关论文

Difficulty-Aware Agentic Orchestration for Query-Specific Multi-Agent Workflows

Arxiv

0+阅读 · 2月13日

FlowMind: Execute-Summarize for Structured Workflow Generation from LLM Reasoning

Arxiv

0+阅读 · 2月12日

Learning to Compose for Cross-domain Agentic Workflow Generation

Arxiv

0+阅读 · 2月11日

AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction

Arxiv

0+阅读 · 2月6日

Internet of Agentic AI: Incentive-Compatible Distributed Teaming and Workflow

Arxiv

0+阅读 · 2月3日

Constrained Process Maps for Multi-Agent Generative AI Workflows

Arxiv

0+阅读 · 2月2日

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Arxiv

0+阅读 · 1月30日

Optimizing Agentic Workflows using Meta-tools

Arxiv

0+阅读 · 1月29日

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

Arxiv

0+阅读 · 1月28日

Batch Query Processing and Optimization for Agentic Workflows

Arxiv

0+阅读 · 1月19日

相关基金

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

42+阅读 · 2015年12月31日

非结构环境下基于三维肢体动作理解的工业机器人交互技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向网络社会的工作流关键技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

自适应快速模拟细节丰富的流体技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

柔性工序选择的混合流水车间调度及其离散群智能算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

云环境中支持混合并行模式的科学工作流的执行优化

国家自然科学基金

0+阅读 · 2014年12月31日

压电智能作动器的高保真完整非线性动力学建模和高精度多通道运动协同同步控制系统一体化优化设计

国家自然科学基金

0+阅读 · 2014年12月31日

基于群体智能的多Agent协作模型与适应性研究

国家自然科学基金

18+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员