Agentic AI will be an essential enabling technology for designing future mobile communication systems, which could provide flexible and customized services, automate complex network operations, and drive autonomous decision-making across the network. This work studies how Large Language Model (LLM)-based network AI agents can be utilized to execute network procedures expressed as sequences of tool invocations. We investigate four approaches, which differ in how the agent obtains the procedure and in how execution is distributed between the agent and the underlying tools. We evaluated the latency and execution correctness across these approaches using a User Equipment (UE) IP allocation procedure as a case study. Furthermore, we conduct a stress test to examine how many sequential procedural steps an LLM agent can reliably execute before failure. Our results show that approaches relying on iterative agent-side reasoning incur higher latency and are more prone to execution errors, while approaches where the procedure is encapsulated within a single tool, which internally orchestrates the required steps by invoking other tools, reduce latency by limiting repeated reasoning. The stress-test results further show that the model with advanced tool-calling capability maintains reliable execution over longer procedures than the other evaluated models; however, all models exhibit reliability degradation as procedure length increases, revealing clear execution limits in multi-step tool-based workflows. To systematically analyze failures in procedure execution, we introduce a procedure-specific error taxonomy that categorizes deviations in multi-step procedural execution.
翻译:智能体人工智能(Agentic AI)将成为设计未来移动通信系统的关键使能技术,能够提供灵活定制化服务、自动化复杂网络操作,并推动跨网络的自主决策。本研究探讨如何利用基于大语言模型(LLM)的网络AI智能体,通过工具调用序列来执行网络流程。我们研究了四种方法,它们在智能体获取流程的方式以及智能体与底层工具之间执行任务的分布上存在差异。以用户设备(UE)IP分配流程为案例,我们评估了这些方法的延迟和执行正确性。此外,我们进行了压力测试,以考察LLM智能体在失效前能够可靠执行多少顺序流程步骤。研究结果表明,依赖智能体端迭代推理的方法会导致更高的延迟且更易出现执行错误,而将流程封装在单个工具内(该工具通过内部调用其他工具来协调所需步骤)的方法则通过限制重复推理降低了延迟。压力测试结果进一步显示:与其他评估模型相比,具有高级工具调用能力的模型能在更长流程中维持可靠执行;然而,所有模型均随流程长度增加出现可靠性下降,暴露出多步骤基于工具的工作流中明确的执行极限。为了系统性地分析流程执行中的故障,我们引入了一种针对流程的误差分类法,用于对多步骤流程执行中的偏差进行分类。