Large Language Models have shown strong capabilities in complex problem solving, yet many agentic systems remain difficult to interpret and control due to opaque internal workflows. While some frameworks offer explicit architectures for collaboration, many deployed agentic systems operate as black boxes to users. We address this by introducing Agentic Workflow Reconstruction (AWR), a new task aiming to synthesize an explicit, interpretable stand-in workflow that approximates a black-box system using only input--output access. We propose AgentXRay, a search-based framework that formulates AWR as a combinatorial optimization problem over discrete agent roles and tool invocations in a chain-structured workflow space. Unlike model distillation, AgentXRay produces editable white-box workflows that match target outputs under an observable, output-based proxy metric, without accessing model parameters. To navigate the vast search space, AgentXRay employs Monte Carlo Tree Search enhanced by a scoring-based Red-Black Pruning mechanism, which dynamically integrates proxy quality with search depth. Experiments across diverse domains demonstrate that AgentXRay achieves higher proxy similarity and reduces token consumption compared to unpruned search, enabling deeper workflow exploration under fixed iteration budgets.
翻译:大型语言模型在复杂问题解决方面展现出强大能力,但由于内部工作流程不透明,许多智能体系统仍难以解释和控制。尽管部分框架提供了明确的协作架构,但许多已部署的智能体系统对用户而言仍如同黑箱。为此,我们提出智能体工作流重构这一新任务,旨在仅通过输入-输出访问,合成一个显式、可解释的替代工作流以逼近黑箱系统。我们提出AgentXRay这一基于搜索的框架,将AWR形式化为链式结构工作流空间中离散智能体角色与工具调用的组合优化问题。与模型蒸馏不同,AgentXRay生成可编辑的白盒工作流,在可观测的基于输出的代理指标下匹配目标输出,且无需访问模型参数。为在巨大搜索空间中导航,AgentXRay采用蒙特卡洛树搜索,并辅以基于评分的红黑剪枝机制,动态整合代理质量与搜索深度。跨领域实验表明,与未剪枝搜索相比,AgentXRay在固定迭代预算下能实现更高的代理相似度、降低令牌消耗,并支持更深层的工作流探索。