Using generated data to improve the performance of downstream discriminative models has recently gained popularity due to the great development of pre-trained language models. In most previous studies, generative models and discriminative models are trained separately and thus could not adapt to any changes in each other. As a result, the generated samples can easily deviate from the real data distribution, while the improvement of the discriminative model quickly reaches saturation. Generative adversarial networks (GANs) train generative models via an adversarial process with discriminative models to achieve joint training. However, the training of standard GANs is notoriously unstable and often falls short of convergence. In this paper, to address these issues, we propose a $\textit{self-consistent learning}$ framework, in which a discriminator and a generator are cooperatively trained in a closed-loop form. The discriminator and the generator enhance each other during multiple rounds of alternating training until a scoring consensus is reached. This framework proves to be easy to train and free from instabilities such as mode collapse and non-convergence. Extensive experiments on sentence semantic matching demonstrate the effectiveness of the proposed framework: the discriminator achieves 10+ AP of improvement on the zero-shot setting and new state-of-the-art performance on the full-data setting.
翻译:利用生成数据提升下游判别模型性能的方法,因预训练语言模型的快速发展而日益流行。以往研究中,生成模型与判别模型通常分开训练,无法相互适应对方的变化,导致生成样本容易偏离真实数据分布,而判别模型的性能提升也迅速达到饱和。生成对抗网络(GANs)通过判别模型的对抗过程训练生成模型,实现联合训练,但标准GANs的训练过程极不稳定,常难以收敛。为解决这些问题,本文提出一种**自我一致学习**框架,其中判别器和生成器以闭环形式协同训练。通过多轮交替训练,判别器与生成器相互增强,直至达到评分共识。该框架易于训练,且不会出现模式坍塌、不收敛等不稳定性问题。在句子语义匹配任务上的大量实验表明,该框架具有显著有效性:判别器在零样本设置下提升了10+个AP,并在全数据设置下取得了最新最优性能。