LabelFusion: Fusing Large Language Models with Transformer Encoders for Robust Financial News Classification

Financial news plays a central role in shaping investor sentiment and short-term dynamics in commodity markets. Many downstream financial applications, such as commodity price prediction or sentiment modeling, therefore rely on the ability to automatically identify news articles relevant to specific assets. However, obtaining large labeled corpora for financial text classification is costly, and transformer-based classifiers such as RoBERTa often degrade significantly in low-data regimes. Our results show that appropriately prompted out-of-the-box Large Language Models (LLMs) achieve strong performance even in such settings. Furthermore, we propose LabelFusion, a hybrid architecture that combines the output of a prompt-engineered LLM with contextual embeddings produced by a fine-tuned RoBERTa encoder through a lightweight Multilayer Perceptron (MLP) voting layer. Evaluated on a ten-class multi-label subset of the Reuters-21578 corpus, LabelFusion achieves a macro F1 score of 96.0% and an accuracy of 92.3% when trained on the full dataset, outperforming both standalone RoBERTa (F1 94.6%) and the standalone LLM (F1 93.9%). In low- to mid-data regimes, however, the LLM alone proves surprisingly competitive, achieving an F1 score of 75.9% even in a zero-shot setting and consistently outperforming LabelFusion until approximately 80% of the training data is available. These results suggest that LLM-only prompting is the preferred strategy under annotation constraints, whereas LabelFusion becomes the most effective solution once sufficient labeled data is available to train the encoder component. The code is available in an anonymized repository.

翻译：金融新闻在塑造投资者情绪和大宗商品市场短期动态方面发挥着核心作用。因此，许多下游金融应用，如大宗商品价格预测或情绪建模，都依赖于自动识别与特定资产相关的新闻报道的能力。然而，为金融文本分类获取大规模标注语料库成本高昂，且基于Transformer的分类器（如RoBERTa）在低数据场景下性能通常会显著下降。我们的研究结果表明，经过适当提示的现成大型语言模型（LLMs）即使在此类场景下也能实现强劲性能。此外，我们提出了LabelFusion，这是一种混合架构，通过一个轻量级多层感知机（MLP）投票层，将经过提示工程设计的LLM的输出与经过微调的RoBERTa编码器生成的上下文嵌入相结合。在Reuters-21578语料库的十类多标签子集上进行评估，当使用完整数据集训练时，LabelFusion实现了96.0%的宏F1分数和92.3%的准确率，优于独立的RoBERTa（F1 94.6%）和独立的LLM（F1 93.9%）。然而，在低至中等数据场景下，单独的LLM表现出惊人的竞争力，即使在零样本设置下也能达到75.9%的F1分数，并且在大约80%的训练数据可用之前，其性能始终优于LabelFusion。这些结果表明，在标注受限的情况下，仅使用LLM提示是首选策略，而一旦有足够的标注数据可用于训练编码器组件，LabelFusion则成为最有效的解决方案。代码已在一个匿名仓库中公开。