We present FinTree, Financial Dataset Pretrain Transformer Encoder for Relation Extraction. Utilizing an encoder language model, we further pretrain FinTree on the financial dataset, adapting the model in financial domain tasks. FinTree stands out with its novel structure that predicts a masked token instead of the conventional [CLS] token, inspired by the Pattern Exploiting Training methodology. This structure allows for more accurate relation predictions between two given entities. The model is trained with a unique input pattern to provide contextual and positional information about the entities of interest, and a post-processing step ensures accurate predictions in line with the entity types. Our experiments demonstrate that FinTree outperforms on the REFinD, a large-scale financial relation extraction dataset. The code and pretrained models are available at https://github.com/HJ-Ok/FinTree.
翻译:我们提出FinTree,一种基于金融数据集预训练的Transformer编码器,用于关系抽取任务。通过利用编码器语言模型,我们在金融数据集上对FinTree进行进一步预训练,使其适应金融领域任务。FinTree的独特之处在于其新颖结构——受模式利用训练方法启发,该结构预测掩码标记而非传统的[CLS]标记。这一设计能够更准确地对两个给定实体间的关系进行预测。模型采用独特的输入模式以提供目标实体的上下文和位置信息,并通过后处理步骤确保预测结果与实体类型一致。实验表明,FinTree在大型金融关系抽取数据集REFinD上取得了更优性能。代码及预训练模型已开源至https://github.com/HJ-Ok/FinTree。