This paper explores the in-context learning capabilities of masked language models, challenging the common view that this ability does not 'emerge' in them. We present an embarrassingly simple inference technique that enables DeBERTa to operate as a generative model without any additional training. Our findings demonstrate that DeBERTa can match and even surpass GPT-3, its contemporary that famously introduced the paradigm of in-context learning. The comparative analysis reveals that the masked and causal language models behave very differently, as they clearly outperform each other on different categories of tasks. This suggests that there is great potential for a hybrid training approach that takes advantage of the strengths of both training objectives.
翻译:本文探究了掩码语言模型的上下文学习能力,挑战了此类能力未在其中“涌现”的普遍观点。我们提出了一种极为简单的推理技术,使DeBERTa无需额外训练即可作为生成式模型运行。研究结果表明,DeBERTa能够匹敌甚至超越GPT-3——这一同期以开创上下文学习范式而闻名的模型。对比分析揭示,掩码语言模型与因果语言模型的行为存在显著差异,二者在不同任务类别上各自展现出明显优势。这表明采用融合两种训练目标优势的混合训练方法具有巨大潜力。