As a cornerstone of the modern digital economy, 3D modeling and rendering demand substantial resources and manual effort when scene editing is performed in the traditional manner. Despite recent progress in VLM-based agents for 3D editing, the fundamental trade-off between editing precision and agent responsiveness remains unresolved. To overcome these limitations, we present EZBlender, a Blender agent with a hybrid framework that combines planning-based task decomposition and reactive local autonomy for efficient human AI collaboration and semantically faithful 3D editing. Specifically, this unexplored Plan-and-ReAct design not only preserves editing quality but also significantly reduces latency and computational cost. To further validate the efficiency and effectiveness of the proposed edge-autonomy architecture, we construct a dedicated multi-tasking benchmark that has not been systematically investigated in prior research. In addition, we provide a comprehensive analysis of language model preference, system responsiveness, and economic efficiency.
翻译:作为现代数字经济的基石,三维建模与渲染在采用传统方式进行场景编辑时,需要耗费大量资源与人力。尽管近期基于视觉语言模型(VLM)的三维编辑代理取得了进展,但编辑精度与代理响应性之间的根本权衡仍未得到解决。为突破这些限制,本文提出EZBlender——一个采用混合框架的Blender代理,该框架结合了基于规划的任务分解与反应式局部自主能力,以实现高效的人机协作与语义忠实的三维编辑。具体而言,这种尚未被探索的“规划-反应”(Plan-and-ReAct)设计不仅保持了编辑质量,还显著降低了延迟与计算成本。为进一步验证所提出的边缘自主架构的效率与有效性,我们构建了一个专门的多任务基准测试集,该测试集在先前研究中尚未得到系统性的探索。此外,本文还对语言模型偏好、系统响应性及经济效率进行了全面分析。