Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and laborious manual annotation. In this work, we propose a novel task, Text2MDT, to explore the automatic extraction of MDTs from medical texts such as medical guidelines and textbooks. We normalize the form of the MDT and create an annotated Text-to-MDT dataset in Chinese with the participation of medical experts. We investigate two different methods for the Text2MDT tasks: (a) an end-to-end framework which only relies on a GPT style large language models (LLM) instruction tuning to generate all the node information and tree structures. (b) The pipeline framework which decomposes the Text2MDT task to three subtasks. Experiments on our Text2MDT dataset demonstrate that: (a) the end-to-end method basd on LLMs (7B parameters or larger) show promising results, and successfully outperform the pipeline methods. (b) The chain-of-thought (COT) prompting method \cite{Wei2022ChainOT} can improve the performance of the fine-tuned LLMs on the Text2MDT test set. (c) the lightweight pipelined method based on encoder-based pretrained models can perform comparably with LLMs with model complexity two magnititudes smaller. Our Text2MDT dataset is open-sourced at \url{https://tianchi.aliyun.com/dataset/95414}, and the source codes are open-sourced at \url{https://github.com/michael-wzhu/text2dt}.
翻译:医疗决策过程的知识可以建模为医疗决策树(MDT),这对构建临床决策支持系统至关重要。然而,当前的MDT构建方法严重依赖耗时费力的人工标注。本研究提出一项新任务Text2MDT,旨在探索从医学指南和教科书等医学文本中自动提取MDT的方法。我们规范了MDT的表达形式,并联合医学专家创建了标注的中文Text-to-MDT数据集。针对Text2MDT任务,我们研究了两种不同方法:(a)端到端框架,仅依赖GPT风格大语言模型(LLM)的指令微调来生成所有节点信息与树结构;(b)流水线框架,将Text2MDT任务分解为三个子任务。在Text2MDT数据集上的实验表明:(a)基于LLM(70亿参数及以上)的端到端方法展现出显著效果,且成功超越流水线方法;(b)思维链(COT)提示方法\cite{Wei2022ChainOT}能提升微调LLM在Text2MDT测试集上的性能;(c)基于编码器预训练模型的轻量级流水线方法,可在模型复杂度降低两个数量级的情况下达到与LLM相当的性能。我们的Text2MDT数据集已开源(\url{https://tianchi.aliyun.com/dataset/95414}),源代码也已开源(\url{https://github.com/michael-wzhu/text2dt})。