MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling

Integrating tools into Large Language Models (LLMs) has facilitated the widespread application. Despite this, in specialized downstream task contexts, reliance solely on tools is insufficient to fully address the complexities of the real world. This particularly restricts the effective deployment of LLMs in fields such as medicine. In this paper, we focus on the downstream tasks of medical calculators, which use standardized tests to assess an individual's health status. We introduce MeNTi, a universal agent architecture for LLMs. MeNTi integrates a specialized medical toolkit and employs meta-tool and nested calling mechanisms to enhance LLM tool utilization. Specifically, it achieves flexible tool selection and nested tool calling to address practical issues faced in intricate medical scenarios, including calculator selection, slot filling, and unit conversion. To assess the capabilities of LLMs for quantitative assessment throughout the clinical process of calculator scenarios, we introduce CalcQA. This benchmark requires LLMs to use medical calculators to perform calculations and assess patient health status. CalcQA is constructed by professional physicians and includes 100 case-calculator pairs, complemented by a toolkit of 281 medical tools. The experimental results demonstrate significant performance improvements with our framework. This research paves new directions for applying LLMs in demanding scenarios of medicine.

翻译：将工具集成至大型语言模型（LLM）已推动其广泛应用。尽管如此，在专业下游任务场景中，仅依赖工具仍不足以完全应对现实世界的复杂性。这在医学等领域尤其限制了LLM的有效部署。本文聚焦于医学计算器的下游任务——该类任务通过标准化测试评估个体健康状况。我们提出MeNTi，一种面向LLM的通用智能体架构。MeNTi集成了专用医学工具包，并采用元工具与嵌套调用机制以增强LLM的工具利用能力。具体而言，它通过灵活的工具选择与嵌套工具调用机制，解决了复杂医疗场景中面临的实际问题，包括计算器选择、参数填充和单位转换等。为评估LLM在计算器场景临床全流程中的量化评估能力，我们构建了CalcQA基准。该基准要求LLM使用医学计算器执行计算并评估患者健康状况。CalcQA由专业医师构建，包含100组病例-计算器配对，并辅以包含281个医学工具的工具包。实验结果表明，我们的框架带来了显著的性能提升。本研究为LLM在医学高要求场景中的应用开辟了新方向。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日