We introduce SEED, an LLM-centric system that allows users to easily create efficient, and effective data management applications. SEED comprises three main components: code generation, model generation, and augmented LLM query to address the challenges that LLM services are computationally and economically expensive and do not always work well on all cases for a given data management task. SEED addresses the expense challenge by localizing LLM computation as much as possible. This includes replacing most of LLM calls with local code, local models, and augmenting LLM queries with batching and data access tools, etc. To ensure effectiveness, SEED features a bunch of optimization techniques to enhance the localized solution and the LLM queries, including automatic code validation, code ensemble, model representatives selection, selective tool usages, etc. Moreover, with SEED users are able to easily construct a data management solution customized to their applications. It allows the users to configure each component and compose an execution pipeline in natural language. SEED then automatically compiles it into an executable program. We showcase the efficiency and effectiveness of SEED using diverse data management tasks such as data imputation, NL2SQL translation, etc., achieving state-of-the-art few-shot performance while significantly reducing the number of required LLM calls.
翻译:摘要:我们提出SEED,一种以LLM为核心的系統,允许用户轻松构建高效、有效的数据管理应用。SEED包含三大核心组件:代码生成、模型生成和增强型LLM查询,旨在解决LLM服务计算与经济成本高昂、且无法在所有数据管理任务中始终表现良好的挑战。SEED通过尽可能本地化LLM计算来应对成本挑战,包括用本地代码、本地模型替代大部分LLM调用,并引入批处理、数据访问工具等增强LLM查询。为确保有效性,SEED采用一系列优化技术提升本地化解决方案与LLM查询性能,例如自动代码验证、代码集成、模型代表选择、选择性工具使用等。此外,用户可通过SEED轻松构建定制化数据管理方案:以自然语言配置各组件并组合执行流水线,SEED自动将其编译为可执行程序。我们通过数据插补、自然语言到SQL转换等多种数据管理任务验证了SEED的效率与有效性,在显著减少LLM调用次数的同时,实现了最先进的少样本学习性能。