Generative modeling has been the dominant approach for large-scale pretraining and zero-shot generalization. In this work, we challenge this convention by showing that discriminative approaches perform substantially better than generative ones on a large number of NLP tasks. Technically, we train a single discriminator to predict whether a text sample comes from the true data distribution, similar to GANs. Since many NLP tasks can be formulated as selecting from a few options, we use this discriminator to predict the concatenation of input and which option has the highest probability of coming from the true data distribution. This simple formulation achieves state-of-the-art zero-shot results on the T0 benchmark, outperforming T0 by 16.0\%, 7.8\%, and 11.5\% respectively on different scales. In the finetuning setting, our approach also achieves new state-of-the-art results on a wide range of NLP tasks, with only 1/4 parameters of previous methods. Meanwhile, our approach requires minimal prompting efforts, which largely improves robustness and is essential for real-world applications. Furthermore, we also jointly train a generalized UD in combination with generative tasks, which maintains its advantage on discriminative tasks and simultaneously works on generative tasks.
翻译:生成式建模一直是大规模预训练和零样本泛化的主导方法。在本文中,我们挑战这一惯例,证明在大量自然语言处理任务上,判别式方法的表现显著优于生成式方法。在技术上,我们训练单个判别器来预测文本样本是否来自真实数据分布,类似于生成对抗网络。由于许多自然语言处理任务可以表述为从几个选项中选取,我们利用该判别器预测输入与各选项的拼接结果来自真实数据分布的最高概率。这一简洁的公式化方法在T0基准测试中取得了零样本任务的最新最佳结果,在不同规模上分别比T0提升16.0%、7.8%和11.5%。在微调设置下,我们的方法也在广泛自然语言处理任务上取得新的最佳结果,且参数量仅为先前方法的四分之一。同时,本方法只需极少的提示工程投入,大幅提升鲁棒性,对实际应用至关重要。此外,我们还联合训练了一个通用判别器与生成式任务结合,使其在保持判别任务优势的同时,也能处理生成任务。