Zero-shot text learning enables text classifiers to handle unseen classes efficiently, alleviating the need for task-specific training data. A simple approach often relies on comparing embeddings of query (text) to those of potential classes. However, the embeddings of a simple query sometimes lack rich contextual information, which hinders the classification performance. Traditionally, this has been addressed by improving the embedding model with expensive training. We introduce QZero, a novel training-free knowledge augmentation approach that reformulates queries by retrieving supporting categories from Wikipedia to improve zero-shot text classification performance. Our experiments across six diverse datasets demonstrate that QZero enhances performance for state-of-the-art static and contextual embedding models without the need for retraining. Notably, in News and medical topic classification tasks, QZero improves the performance of even the largest OpenAI embedding model by at least 5% and 3%, respectively. Acting as a knowledge amplifier, QZero enables small word embedding models to achieve performance levels comparable to those of larger contextual models, offering the potential for significant computational savings. Additionally, QZero offers meaningful insights that illuminate query context and verify topic relevance, aiding in understanding model predictions. Overall, QZero improves embedding-based zero-shot classifiers while maintaining their simplicity. This makes it particularly valuable for resource-constrained environments and domains with constantly evolving information.
翻译:零样本文本学习使文本分类器能够高效处理未见类别,从而减轻对任务特定训练数据的需求。一种简单方法通常依赖于比较查询(文本)与潜在类别的嵌入表示。然而,简单查询的嵌入有时缺乏丰富的上下文信息,这限制了分类性能。传统上,这一问题通过改进嵌入模型(需昂贵训练)来解决。本文提出QZero——一种无需训练的知识增强方法,通过从维基百科检索支持性类别来重构查询,从而提升零样本文本分类性能。我们在六个多样化数据集上的实验表明,QZero能提升最先进的静态与上下文嵌入模型性能,且无需重新训练。值得注意的是,在新闻和医学主题分类任务中,QZero即使对最大的OpenAI嵌入模型也能分别提升至少5%和3%的性能。作为知识放大器,QZero使小型词嵌入模型能达到与大型上下文模型相当的性能水平,为显著节约计算资源提供了可能。此外,QZero能提供揭示查询上下文并验证主题相关性的可解释依据,有助于理解模型预测机制。总体而言,QZero在保持嵌入式零样本分类器简洁性的同时提升了其性能,这对资源受限环境及信息持续演进的领域具有重要价值。