Large language models (LLMs) have demonstrated great potential in the financial domain. Thus, it becomes important to assess the performance of LLMs in the financial tasks. In this work, we introduce CFBenchmark, to evaluate the performance of LLMs for Chinese financial assistant. The basic version of CFBenchmark is designed to evaluate the basic ability in Chinese financial text processing from three aspects~(\emph{i.e.} recognition, classification, and generation) including eight tasks, and includes financial texts ranging in length from 50 to over 1,800 characters. We conduct experiments on several LLMs available in the literature with CFBenchmark-Basic, and the experimental results indicate that while some LLMs show outstanding performance in specific tasks, overall, there is still significant room for improvement in basic tasks of financial text processing with existing models. In the future, we plan to explore the advanced version of CFBenchmark, aiming to further explore the extensive capabilities of language models in more profound dimensions as a financial assistant in Chinese. Our codes are released at https://github.com/TongjiFinLab/CFBenchmark.
翻译:大语言模型(LLMs)在金融领域展现出巨大潜力,因此评估其在金融任务中的表现变得尤为重要。本文提出CFBenchmark,用于评估大语言模型作为中文金融助手的能力。其基础版本(CFBenchmark-Basic)从识别、分类和生成三个方面(涵盖八项任务)评估模型在中文金融文本处理中的基础能力,并包含长度从50字到1800余字不等的金融文本。我们基于CFBenchmark-Basic对文献中若干大语言模型进行了实验,结果表明:尽管部分模型在特定任务上表现突出,但现有模型在金融文本处理的基础任务上整体仍有显著提升空间。未来,我们将探索CFBenchmark的高级版本,旨在从更深入的维度进一步挖掘语言模型作为中文金融助手的广泛能力。我们的代码已开源至https://github.com/TongjiFinLab/CFBenchmark。