SEED: Simple, Efficient, and Effective Data Management via Large Language Models

We introduce SEED, an LLM-centric system that allows users to easily create efficient, and effective data management applications. SEED comprises three main components: code generation, model generation, and augmented LLM query to address the challenges that LLM services are computationally and economically expensive and do not always work well on all cases for a given data management task. SEED addresses the expense challenge by localizing LLM computation as much as possible. This includes replacing most of LLM calls with local code, local models, and augmenting LLM queries with batching and data access tools, etc. To ensure effectiveness, SEED features a bunch of optimization techniques to enhance the localized solution and the LLM queries, including automatic code validation, code ensemble, model representatives selection, selective tool usages, etc. Moreover, with SEED users are able to easily construct a data management solution customized to their applications. It allows the users to configure each component and compose an execution pipeline in natural language. SEED then automatically compiles it into an executable program. We showcase the efficiency and effectiveness of SEED using diverse data management tasks such as data imputation, NL2SQL translation, etc., achieving state-of-the-art few-shot performance while significantly reducing the number of required LLM calls.

翻译：摘要：我们提出SEED，一种以LLM为核心的系統，允许用户轻松构建高效、有效的数据管理应用。SEED包含三大核心组件：代码生成、模型生成和增强型LLM查询，旨在解决LLM服务计算与经济成本高昂、且无法在所有数据管理任务中始终表现良好的挑战。SEED通过尽可能本地化LLM计算来应对成本挑战，包括用本地代码、本地模型替代大部分LLM调用，并引入批处理、数据访问工具等增强LLM查询。为确保有效性，SEED采用一系列优化技术提升本地化解决方案与LLM查询性能，例如自动代码验证、代码集成、模型代表选择、选择性工具使用等。此外，用户可通过SEED轻松构建定制化数据管理方案：以自然语言配置各组件并组合执行流水线，SEED自动将其编译为可执行程序。我们通过数据插补、自然语言到SQL转换等多种数据管理任务验证了SEED的效率与有效性，在显著减少LLM调用次数的同时，实现了最先进的少样本学习性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/