Data curation is a wide-ranging area which contains many critical but time-consuming data processing tasks. However, the diversity of such tasks makes it challenging to develop a general-purpose data curation system. To address this issue, we present Lingua Manga, a user-friendly and versatile system that utilizes pre-trained large language models. Lingua Manga offers automatic optimization for achieving high performance and label efficiency while facilitating flexible and rapid development. Through three example applications with distinct objectives and users of varying levels of technical proficiency, we demonstrate that Lingua Manga can effectively assist both skilled programmers and low-code or even no-code users in addressing data curation challenges.
翻译:数据管理是一个涵盖广泛且包含许多关键但耗时数据处理任务的领域。然而,这些任务的多样性使得开发通用的数据管理系统面临挑战。为解决这一问题,我们提出Lingua Manga——一种利用预训练大语言模型的用户友好型多用途系统。该系统通过自动优化实现高性能与标注效率,同时支持灵活快速地开发。通过三个具有不同目标及技术熟练度各异用户的示例应用,我们证明Lingua Manga能有效协助从熟练程序员到低代码甚至无代码用户应对数据管理挑战。