We introduce an open-domain topic classification system that accepts user-defined taxonomy in real time. Users will be able to classify a text snippet with respect to any candidate labels they want, and get instant response from our web interface. To obtain such flexibility, we build the backend model in a zero-shot way. By training on a new dataset constructed from Wikipedia, our label-aware text classifier can effectively utilize implicit knowledge in the pretrained language model to handle labels it has never seen before. We evaluate our model across four datasets from various domains with different label sets. Experiments show that the model significantly improves over existing zero-shot baselines in open-domain scenarios, and performs competitively with weakly-supervised models trained on in-domain data.
翻译:我们提出了一种开放领域的主题分类系统,该系统能够实时接受用户自定义的分类体系。用户可针对任意候选标签对文本片段进行分类,并通过网页界面即时获得响应。为实现这种灵活性,我们以零样本方式构建了后端模型。通过在维基百科构建的新数据集上进行训练,我们的标签感知文本分类器能够有效利用预训练语言模型中的隐含知识,处理从未见过的标签。我们使用来自不同领域、包含不同标签集的四个数据集对模型进行评估。实验结果表明,该模型在开放领域场景中显著优于现有的零样本基线方法,其性能与基于领域内数据训练的弱监督模型相当。