Background/introduction: Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in retrieval-augmented generation. But do commonly used plain averaging or prompt templates surface it enough? Methods: Given 110M parameters BERT's hidden representations from multiple layers and multiple tokens we tried various ways to extract optimal sentence representations. We tested various token aggregation and representation post-processing techniques. We also tested multiple ways of using a general Wikitext dataset to complement BERTs sentence representations. All methods were tested on 8 Semantic Textual Similarity (STS), 6 short text clustering, and 12 classification tasks. We also evaluated our representation-shaping techniques on other static models, including random token representations. Results: Proposed representation extraction methods improved the performance on STS and clustering tasks for all models considered. Very high improvements for static token-based models, especially random embeddings for STS tasks almost reach the performance of BERT-derived representations. Conclusions: Our work shows that for multiple tasks simple baselines with representation shaping techniques reach or even outperform more complex BERT-based models or are able to contribute to their performance.
翻译:背景/引言:预训练的Transformer模型在许多自然语言处理任务中表现出色,因此被期望能够承载输入句子或文本意义的表征。这些句子级别的嵌入在检索增强生成中也具有重要意义。但常用的简单平均或提示模板方法是否足以提取这些表征?方法:基于具有1.1亿参数的BERT模型,我们从多个层级和多个词元中获取隐藏表征,尝试了多种提取最优句子表征的方法。我们测试了多种词元聚合和表征后处理技术,同时探索了使用通用Wikitext数据集来增强BERT句子表征的多种方式。所有方法均在8个语义文本相似性(STS)任务、6个短文本聚类任务和12个分类任务上进行测试。我们还在其他静态模型(包括随机词元表征)上评估了所提出的表征塑形技术。结果:所提出的表征提取方法在所有测试模型上均提升了STS和聚类任务的性能。对于基于静态词元的模型(特别是随机嵌入)提升尤为显著,在STS任务中几乎达到BERT衍生表征的性能水平。结论:我们的研究表明,对于多项任务,采用表征塑形技术的简单基线方法能够达到甚至超越更复杂的基于BERT的模型,或有助于提升其性能。