State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research in this important area of 'dealing with novelties', we introduce 'NoveltyTask', a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks. We provide mathematical formulation of NoveltyTask and instantiate it with the authorship attribution task that pertains to identifying the correct author of a given text. We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask. We conduct comprehensive experiments and explore several baseline methods for the task. Our results show that the methods achieve considerably low performance making the task challenging and leaving sufficient room for improvement. Finally, we believe our work will encourage research in this underexplored area of dealing with novelties, an important step en route to developing robust systems.
翻译:当前最先进的自然语言处理模型在“封闭世界”场景中表现出色,即训练时已知评估集中的所有标签。然而,在真实应用场景中,经常出现不属于任何已知类别的“新颖”实例。这使得处理新颖性的能力至关重要。为系统化推进“处理新颖性”这一重要领域的研究,我们提出“NoveltyTask”多阶段任务,用于评估系统在流水线式新颖性“检测”与“适应”任务上的性能。我们给出了NoveltyTask的数学形式化描述,并以作者归属识别任务(即识别给定文本的正确作者)为例进行实例化。我们基于亚马逊评论语料库构建了一个包含20万条实例(涵盖200个作者/标签)的大规模数据集用于NoveltyTask。通过全面实验,我们探索了多种基线方法。结果表明,现有方法的性能显著偏低,使得该任务具有挑战性且存在充分的改进空间。最后,我们相信本研究将推动在“处理新颖性”这一尚未充分探索领域中的后续研究,这是迈向构建鲁棒系统的重要一步。