The Natural Language Processing(NLP) community has been using crowd sourcing techniques to create benchmark datasets such as General Language Understanding and Evaluation(GLUE) for training modern Language Models such as BERT. GLUE tasks measure the reliability scores using inter annotator metrics i.e. Cohens Kappa. However, the reliability aspect of LMs has often been overlooked. To counter this problem, we explore a knowledge-guided LM ensembling approach that leverages reinforcement learning to integrate knowledge from ConceptNet and Wikipedia as knowledge graph embeddings. This approach mimics human annotators resorting to external knowledge to compensate for information deficits in the datasets. Across nine GLUE datasets, our research shows that ensembling strengthens reliability and accuracy scores, outperforming state of the art.
翻译:自然语言处理领域长期以来依赖众包技术构建基准数据集(如通用语言理解评估GLUE),用于训练BERT等现代语言模型。GLUE任务通过标注者间一致性指标(如Cohen's Kappa)衡量可靠性分数,但语言模型的可靠性问题常被忽视。为解决该问题,我们探索了一种知识引导的语言模型集成方法,通过强化学习将ConceptNet与Wikipedia的知识图嵌入进行整合。该方法模拟人类标注者借助外部知识弥补数据集信息不足的行为。在九个GLUE数据集上的实验表明,该集成方法在提升可靠性与准确率方面均达到当前最优水平。