There has been significant recent interest in harnessing LLMs to control software systems through multi-step reasoning, planning and tool-usage. While some promising results have been obtained, application to specific domains raises several general issues including the control of specialized domain tools, the lack of existing datasets for training and evaluation, and the non-triviality of automated system evaluation and improvement. In this paper, we present a case-study where we examine these issues in the context of a specific domain. Specifically, we present an automated math visualizer and solver system for mathematical pedagogy. The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands. We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system by comparing them to ground-truth expressions. We have open sourced the data-sets and code for the proposed system.
翻译:近年来,利用大型语言模型通过多步推理、规划与工具使用来控制软件系统引起了广泛关注。尽管已取得一些有前景的成果,但在特定领域应用中仍面临若干普遍性问题,包括专用领域工具的控制、训练与评估数据集的缺乏,以及自动化系统评估与改进的复杂性。本文通过一个具体领域的案例研究来探讨这些问题。具体而言,我们提出一个用于数学教学的自动化数学可视化与求解系统。该系统通过协调数学求解器与数学绘图工具,能够根据简单的自然语言指令生成精确的可视化结果。我们描述了专用数据集的构建过程,并开发了一种自动评估器,通过将系统输出与真实表达式进行比对,实现对系统性能的便捷评估。我们已开源所提系统的数据集与代码。