The Zero-Shot Learning (ZSL) task pertains to the identification of entities or relations in texts that were not seen during training. ZSL has emerged as a critical research area due to the scarcity of labeled data in specific domains, and its applications have grown significantly in recent years. With the advent of large pretrained language models, several novel methods have been proposed, resulting in substantial improvements in ZSL performance. There is a growing demand, both in the research community and industry, for a comprehensive ZSL framework that facilitates the development and accessibility of the latest methods and pretrained models.In this study, we propose a novel ZSL framework called Zshot that aims to address the aforementioned challenges. Our primary objective is to provide a platform that allows researchers to compare different state-of-the-art ZSL methods with standard benchmark datasets. Additionally, we have designed our framework to support the industry with readily available APIs for production under the standard SpaCy NLP pipeline. Our API is extendible and evaluable, moreover, we include numerous enhancements such as boosting the accuracy with pipeline ensembling and visualization utilities available as a SpaCy extension.
翻译:零样本学习(ZSL)任务旨在识别训练阶段未在文本中出现过的实体或关系。由于特定领域标注数据的稀缺性,ZSL已成为一个关键研究领域,且近年来其应用显著增长。随着大规模预训练语言模型的出现,多种新颖方法被提出,极大提升了ZSL性能。无论是学术界还是工业界,都日益需要一个全面的ZSL框架来促进最新方法与预训练模型的可及性与开发。本研究提出名为Zshot的新颖ZSL框架,旨在应对上述挑战。我们的主要目标是提供一个平台,使研究人员能够使用标准基准数据集对比不同的前沿ZSL方法。此外,我们设计的框架可通过标准SpaCy自然语言处理流程为工业界提供即用型API以支持生产。该API具备可扩展性与可评估性,同时我们还引入了多项增强功能,例如通过流程集成提升准确率,以及以SpaCy扩展形式提供的可视化工具。