Pretrained language models have improved zero-shot text classification by allowing the transfer of semantic knowledge from the training data in order to classify among specific label sets in downstream tasks. We propose a simple way to further improve zero-shot accuracies with minimal effort. We curate small finetuning datasets intended to describe the labels for a task. Unlike typical finetuning data, which has texts annotated with labels, our data simply describes the labels in language, e.g., using a few related terms, dictionary/encyclopedia entries, and short templates. Across a range of topic and sentiment datasets, our method is more accurate than zero-shot by 17-19% absolute. It is also more robust to choices required for zero-shot classification, such as patterns for prompting the model to classify and mappings from labels to tokens in the model's vocabulary. Furthermore, since our data merely describes the labels but does not use input texts, finetuning on it yields a model that performs strongly on multiple text domains for a given label set, even improving over few-shot out-of-domain classification in multiple settings.
翻译:预训练语言模型通过将训练数据中的语义知识迁移至下游任务中特定标签集的分类,提升了零样本文本分类的性能。我们提出了一种简单方法,能以极小代价进一步提高零样本分类准确率:为任务标签构建小型微调描述数据集。与典型微调数据(文本标注对应标签)不同,我们的数据仅用语言描述标签,例如使用若干相关术语、词典/百科全书条目及简短模板。在多个主题与情感数据集上的实验表明,该方法相较零样本分类绝对准确率提升17-19%。同时,该方法对零样本分类所需的选择(如促使模型分类的提示模式、标签到词汇表词元的映射)更具鲁棒性。此外,由于数据仅描述标签而不使用输入文本,基于此类数据微调的模型能在给定标签集的多个文本领域内表现优异,甚至在多种场景下超越少样本跨领域分类性能。