The hallucination issue is recognized as a fundamental deficiency of large language models (LLMs), especially when applied to fields such as finance, education, and law. Despite the growing concerns, there has been a lack of empirical investigation. In this paper, we provide an empirical examination of LLMs' hallucination behaviors in financial tasks. First, we empirically investigate LLM model's ability of explaining financial concepts and terminologies. Second, we assess LLM models' capacity of querying historical stock prices. Third, to alleviate the hallucination issue, we evaluate the efficacy of four practical methods, including few-shot learning, Decoding by Contrasting Layers (DoLa), the Retrieval Augmentation Generation (RAG) method and the prompt-based tool learning method for a function to generate a query command. Finally, our major finding is that off-the-shelf LLMs experience serious hallucination behaviors in financial tasks. Therefore, there is an urgent need to call for research efforts in mitigating LLMs' hallucination.
翻译:幻觉问题被视为大型语言模型(LLMs)的基本缺陷,尤其在金融、教育和法律等领域的应用中尤为突出。尽管人们日益关注此问题,但缺乏相应的实证研究。本文对LLMs在金融任务中的幻觉行为进行了经验性考察。首先,我们实证探究了LLM模型解释金融概念与术语的能力。其次,评估了LLM模型查询历史股票价格的能力。第三,为缓解幻觉问题,我们评估了四种实用方法的效果,包括少样本学习、对比层级解码(DoLa)、检索增强生成(RAG)方法以及基于提示的工具学习方法(用于生成查询命令的函数)。最终,我们的主要发现是:现成的LLMs在金融任务中存在严重的幻觉行为。因此,迫切需要呼吁学界投入研究以缓解LLMs的幻觉问题。