Among the many tasks that Large Language Models (LLMs) have revolutionized is text classification. However, existing approaches for applying pretrained LLMs to text classification predominantly rely on using single token outputs from only the last layer of hidden states. As a result, they suffer from limitations in efficiency, task-specificity, and interpretability. In our work, we contribute an approach that uses all internal representations by employing multiple pooling strategies on all activation and hidden states. Our novel lightweight strategy, Sparsify-then-Classify (STC) first sparsifies task-specific features layer-by-layer, then aggregates across layers for text classification. STC can be applied as a seamless plug-and-play module on top of existing LLMs. Our experiments on a comprehensive set of models and datasets demonstrate that STC not only consistently improves the classification performance of pretrained and fine-tuned models, but is also more efficient for both training and inference, and is more intrinsically interpretable.
翻译:大语言模型(LLM)已革新文本分类等多项任务。然而,现有将预训练LLM应用于文本分类的方法主要依赖于仅从最后一层隐藏状态的单个token输出。因此,它们在效率、任务特异性和可解释性方面存在局限性。在本研究中,我们提出了一种利用所有内部表示的方法,通过对所有激活状态和隐藏状态采用多种池化策略。我们的轻量级新策略——稀疏化再分类(Sparsify-then-Classify,STC)首先逐层稀疏化任务特定特征,然后跨层聚合用于文本分类。STC可作为无缝即插即用模块应用于现有LLM之上。我们在综合模型和数据集上的实验表明,STC不仅持续提升了预训练和微调模型的分类性能,而且在训练和推理中效率更高,且具有更强的内在可解释性。