Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them. While Large Language Models (LLMs) have demonstrated remarkable performance in data annotation tasks on general domain datasets, their effectiveness on domain specific datasets remains underexplored. To address this gap, we investigate the potential of LLMs as efficient data annotators for extracting relations in financial documents. We compare the annotations produced by three LLMs (GPT-4, PaLM 2, and MPT Instruct) against expert annotators and crowdworkers. We demonstrate that the current state-of-the-art LLMs can be sufficient alternatives to non-expert crowdworkers. We analyze models using various prompts and parameter settings and find that customizing the prompts for each relation group by providing specific examples belonging to those groups is paramount. Furthermore, we introduce a reliability index (LLM-RelIndex) used to identify outputs that may require expert attention. Finally, we perform an extensive time, cost and error analysis and provide recommendations for the collection and usage of automated annotations in domain-specific settings.
翻译:金融领域标注数据集的收集面临挑战,原因在于领域专家稀缺且雇佣成本高昂。尽管大型语言模型(LLMs)在通用领域数据集的数据标注任务中展现出卓越性能,但其在特定领域数据集上的有效性尚未得到充分探索。为弥补这一研究空白,我们探究了LLMs作为高效数据标注工具在金融文档关系抽取中的潜力。我们将三种LLMs(GPT-4、PaLM 2和MPT Instruct)生成的标注结果与专家标注者及众包工人的标注结果进行对比,证明当前最先进的LLMs能够成为非专家众包工人的充分替代方案。通过使用多种提示词与参数设置进行分析,我们发现针对每组关系定制提示词(提供该组专属具体示例)至关重要。此外,我们提出了一种可靠性指数(LLM-RelIndex),用于识别可能需要专家介入的输出结果。最后,我们开展了全面的时间、成本与错误分析,为特定领域场景下自动化标注的收集与使用提供了建议方案。