We present a production-optimized multi-agent system designed to translate natural language queries into executable Python code for structured data analytics. Unlike systems that rely on expensive frontier models, our approach achieves high accuracy and cost efficiency through three key innovations: (1) a semantic caching system with LLM-based equivalence detection and structured adaptation hints that provides cache hit rates of 67% on production queries; (2) a dual-threshold decision mechanism that separates exact-match retrieval from reference-guided generation; and (3) an intent-driven dynamic prompt assembly system that reduces token consumption by 40-60% through table-aware context filtering. The system has been deployed in production for enterprise inventory management, processing over 10,000 queries with an average latency of 8.2 seconds and 94.3% semantic accuracy. We describe the architecture, present empirical results from production deployment, and discuss practical considerations for deploying LLM-based analytics systems at scale.
翻译:我们提出了一种面向生产环境优化的多智能体系统,旨在将自然语言查询转换为可执行的Python代码以进行结构化数据分析。与依赖昂贵前沿模型的系统不同,我们的方法通过三项关键创新实现了高准确性与成本效益:(1) 配备基于LLM的等价性检测与结构化适配提示的语义缓存系统,在生产查询中实现了67%的缓存命中率;(2) 将精确匹配检索与参考引导生成相分离的双阈值决策机制;(3) 通过表感知上下文过滤将令牌消耗降低40-60%的意图驱动动态提示组装系统。该系统已部署于企业库存管理的生产环境,处理了超过10,000次查询,平均延迟为8.2秒,语义准确率达94.3%。我们描述了系统架构,展示了生产部署的实证结果,并讨论了大规模部署基于LLM的分析系统时的实际考量。