Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models. This race for bigger models has also underscored the need to continue the pursuit of practical distillation approaches that can leverage the knowledge acquired by these big models in a compute-efficient manner. Having this goal in mind, we build on recent work to propose a hallucination-free framework for sequence tagging that is especially suited for distillation. We show empirical results of new state-of-the-art performance across multiple sequence labelling datasets and validate the usefulness of this framework for distilling a large model in a few-shot learning scenario.
翻译:受一系列任务中鼓舞人心的成果驱动,自然语言处理领域正加速竞相开发更大型的语言模型。这种追求更大模型的竞赛也凸显了持续探索实用蒸馏方法的必要性,这些方法能以计算高效的方式利用大型模型获取的知识。基于这一目标,我们在近期研究基础上提出了一种适用于蒸馏的无幻觉序列标注框架。我们在多个序列标注数据集上展示了新的最优性能实证结果,并验证了该框架在少样本学习场景下蒸馏大型模型的有效性。