Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.
翻译:大型语言模型(LLMs)在多种自然语言处理(NLP)任务中展现出了卓越的性能,通常能匹敌甚至超越最先进的特定任务模型。本研究旨在评估LLMs的金融推理能力。我们利用特许金融分析师(CFA)项目的模拟考试题目,对ChatGPT和GPT-4在金融分析中的表现进行了全面评估,考察了零样本(ZS)、思维链(CoT)和少样本(FS)场景。我们深入分析了模型的表现与局限性,并估算了它们通过CFA考试的可能性。最后,我们概述了增强LLMs在金融领域适用性的潜在策略与改进方向。从这一视角出发,我们希望本研究能为未来通过严谨评估持续提升LLMs金融推理能力的研究铺平道路。