The most recent large language models such as ChatGPT and GPT-4 have garnered significant attention, as they are capable of generating high-quality responses to human input. Despite the extensive testing of ChatGPT and GPT-4 on generic text corpora, showcasing their impressive capabilities, a study focusing on financial corpora has not been conducted. In this study, we aim to bridge this gap by examining the potential of ChatGPT and GPT-4 as a solver for typical financial text analytic problems in the zero-shot or few-shot setting. Specifically, we assess their capabilities on four representative tasks over five distinct financial textual datasets. The preliminary study shows that ChatGPT and GPT-4 struggle on tasks such as financial named entity recognition (NER) and sentiment analysis, where domain-specific knowledge is required, while they excel in numerical reasoning tasks. We report both the strengths and limitations of the current versions of ChatGPT and GPT-4, comparing them to the state-of-the-art finetuned models as well as pretrained domain-specific generative models. Our experiments provide qualitative studies, through which we hope to help understand the capability of the existing models and facilitate further improvements.
翻译:最新的ChatGPT和GPT-4等大型语言模型因能够对人工输入生成高质量回应而备受关注。尽管ChatGPT和GPT-4已在通用文本语料库上经过广泛测试并展现出令人瞩目的能力,但针对金融语料库的研究尚未开展。本研究旨在填补这一空白,通过考察ChatGPT和GPT-4在零样本或少样本设定下作为典型金融文本分析问题求解器的潜力。具体而言,我们评估了它们在五个不同金融文本数据集上四项代表性任务中的能力。初步研究表明:ChatGPT和GPT-4在金融命名实体识别(NER)和情感分析等需要领域特定知识的任务中表现欠佳,但在数值推理任务中表现出色。我们报告了当前版本ChatGPT和GPT-4的优势与局限,并与最先进的微调模型及预训练领域特定生成模型进行了对比。通过定性实验分析,我们期望有助于理解现有模型的能力,并促进进一步改进。