ELLMPEG: An Edge-based Agentic LLM Video Processing Tool

Large language models (LLMs), the foundation of generative AI systems like ChatGPT, are transforming many fields and applications, including multimedia, enabling more advanced content generation, analysis, and interaction. However, cloud-based LLM deployments face three key limitations: high computational and energy demands, privacy and reliability risks from remote processing, and recurring API costs. Recent advances in agentic AI, especially in structured reasoning and tool use, offer a better way to exploit open and locally deployed tools and LLMs. This paper presents ELLMPEG, an edge-enabled agentic LLM framework for the automated generation of video-processing commands. ELLMPEG integrates tool-aware Retrieval-Augmented Generation (RAG) with iterative self-reflection to produce and locally verify executable FFmpeg and VVenC commands directly at the edge, eliminating reliance on external cloud APIs. To evaluate ELLMPEG, we collect a dedicated prompt dataset comprising 480 diverse queries covering different categories of FFmpeg and the Versatile Video Codec (VVC) encoder (VVenC) commands. We validate command generation accuracy and evaluate four open-source LLMs based on command validity, tokens generated per second, inference time, and energy efficiency. We also execute the generated commands to assess their runtime correctness and practical applicability. Experimental results show that Qwen2.5, when augmented with the ELLMPEG framework, achieves an average command-generation accuracy of 78 % with zero recurring API cost, outperforming all other open-source models across both the FFmpeg and VVenC datasets.

翻译：作为ChatGPT等生成式人工智能系统的基石，大语言模型（LLMs）正在变革包括多媒体在内的诸多领域与应用，推动内容生成、分析与交互迈向更高阶段。然而，基于云端的LLM部署面临三大关键局限：高昂的计算与能耗需求、远程处理带来的隐私与可靠性风险，以及持续的API调用成本。近期智能体人工智能（Agentic AI）的进展，特别是在结构化推理与工具调用方面，为利用开源及本地部署的工具与LLMs提供了更优路径。本文提出ELLMPEG，一种支持边缘计算的智能体化LLM框架，用于自动化生成视频处理命令。ELLMPEG将工具感知的检索增强生成（RAG）与迭代式自反思机制相结合，直接在边缘侧生成并本地验证可执行的FFmpeg与VVenC命令，从而消除对外部云端API的依赖。为评估ELLMPEG，我们构建了一个包含480个多样化查询的专用提示数据集，涵盖FFmpeg及多功能视频编码器（VVC）编码工具（VVenC）的不同命令类别。我们验证了命令生成的准确性，并基于命令有效性、每秒生成令牌数、推理时间及能效四项指标评估了四种开源LLM。同时，我们执行了生成的命令以检验其运行正确性与实际适用性。实验结果表明，经ELLMPEG框架增强的Qwen2.5模型在FFmpeg和VVenC数据集上均优于其他开源模型，其平均命令生成准确率达到78%，且实现了零持续API成本。