MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents

There has been significant recent interest in harnessing LLMs to control software systems through multi-step reasoning, planning and tool-usage. While some promising results have been obtained, application to specific domains raises several general issues including the control of specialized domain tools, the lack of existing datasets for training and evaluation, and the non-triviality of automated system evaluation and improvement. In this paper, we present a case-study where we examine these issues in the context of a specific domain. Specifically, we present an automated math visualizer and solver system for mathematical pedagogy. The system orchestrates mathematical solvers and math graphing tools to produce accurate visualizations from simple natural language commands. We describe the creation of specialized data-sets, and also develop an auto-evaluator to easily evaluate the outputs of our system by comparing them to ground-truth expressions. We have open sourced the data-sets and code for the proposed system.

翻译：近年来，利用大型语言模型通过多步推理、规划与工具使用来控制软件系统引起了广泛关注。尽管已取得一些有前景的成果，但在特定领域应用中仍面临若干普遍性问题，包括专用领域工具的控制、训练与评估数据集的缺乏，以及自动化系统评估与改进的复杂性。本文通过一个具体领域的案例研究来探讨这些问题。具体而言，我们提出一个用于数学教学的自动化数学可视化与求解系统。该系统通过协调数学求解器与数学绘图工具，能够根据简单的自然语言指令生成精确的可视化结果。我们描述了专用数据集的构建过程，并开发了一种自动评估器，通过将系统输出与真实表达式进行比对，实现对系统性能的便捷评估。我们已开源所提系统的数据集与代码。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日