Large Language Models (LLMs) have shown their success in language understanding and reasoning on general topics. However, their capability to perform inference based on user-specified structured data and knowledge in corpus-rare concepts, such as causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify the causal task, execute a corresponding function, and interpret its numerical results based on users' queries and the provided dataset. Meanwhile, we propose a data generation process for more controllable GPT prompting and present two instruction-tuning datasets: (1) Causal-Retrieval-Bench for causal problem identification and input parameter extraction for causal function calling and (2) Causal-Interpret-Bench for in-context causal interpretation. By conducting end-to-end evaluations and two ablation studies, we showed that LLM4Causal can deliver end-to-end solutions for causal problems and provide easy-to-understand answers, which significantly outperforms the baselines.
翻译:大语言模型在通用主题的语言理解与推理方面已展现出成功,但它们在基于用户指定结构化数据和罕见概念(如因果决策)的语料知识上进行推理的能力仍然有限。本研究探索将开源大语言模型微调为LLM4Causal的可能性,该模型能够识别因果任务、执行相应函数,并根据用户查询和提供的数据集解释数值结果。同时,我们提出一种用于更可控的GPT提示生成的数据生成流程,并构建两个指令微调数据集:(1)因果问题识别与因果函数调用输入参数提取的Causal-Retrieval-Bench数据集,(2)用于上下文因果解释的Causal-Interpret-Bench数据集。通过端到端评估和两项消融实验,我们证明LLM4Causal能够为因果问题提供端到端解决方案并生成易于理解的答案,其性能显著优于基线模型。