Current LLM-based coding agents follow a serial execution paradigm: the model first generates the complete code, then invokes an interpreter to execute it. This sequential workflow leaves the executor idle during generation and the generator idle during execution, resulting in unnecessary end-to-end latency. We observe that, unlike human developers, LLMs produce code tokens sequentially without revision, making it possible to execute code as it is being generated. We formalize this parallel execution paradigm, modeling it as a three-stage pipeline of generation, detection, and execution, and derive closed-form latency bounds that characterize its speedup potential and operating regimes. We then present Eager, a concrete implementation featuring AST-based chunking, dynamic batching with gated execution, and early error interruption. We evaluate Eager across four benchmarks, seven LLMs, and three execution environments. Results show that Eager reduces the non-overlapped execution latency by up to 99.9% and the end-to-end latency by up to 55% across seven LLMs and four benchmarks.
翻译:当前基于LLM的编码智能体遵循串行执行范式:模型首先生成完整代码,然后调用解释器执行。这种串行工作流程导致执行器在生成阶段闲置、生成器在执行阶段闲置,造成不必要的端到端延迟。我们观察到,与人类开发者不同,LLM按顺序生成代码标记且无需修改,这使得代码可以在生成过程中同步执行。我们将这种并行执行范式形式化,建模为包含生成、检测和执行的三阶段流水线,并推导出表征其加速潜力与运行模式的闭式延迟边界。随后我们提出Eager具体实现方案,包含基于AST的代码分块、带门控执行的动态批处理以及早期错误中断机制。我们在四个基准测试、七种LLM和三种执行环境中评估Eager。结果表明,在七种LLM和四个基准测试中,Eager将非重叠执行延迟降低最高99.9%,端到端延迟降低最高55%。