Using generated data to improve the performance of downstream discriminative models has recently gained popularity due to the great development of pre-trained language models. In most previous studies, generative models and discriminative models are trained separately and thus could not adapt to any changes in each other. As a result, the generated samples can easily deviate from the real data distribution, while the improvement of the discriminative model quickly reaches saturation. Generative adversarial networks (GANs) train generative models via an adversarial process with discriminative models to achieve joint training. However, the training of standard GANs is notoriously unstable and often falls short of convergence. In this paper, to address these issues, we propose a $\textit{self-consistent learning}$ framework, in which a discriminator and a generator are cooperatively trained in a closed-loop form. The discriminator and the generator enhance each other during multiple rounds of alternating training until a scoring consensus is reached. This framework proves to be easy to train and free from instabilities such as mode collapse and non-convergence. Extensive experiments on sentence semantic matching demonstrate the effectiveness of the proposed framework: the discriminator achieves 10+ AP of improvement on the zero-shot setting and new state-of-the-art performance on the full-data setting.
翻译:利用生成数据提升下游判别模型性能的方法,近年来因预训练语言模型的快速发展而备受关注。在以往研究中,生成模型与判别模型通常被分开训练,因此无法适应彼此的变化。这导致生成样本容易偏离真实数据分布,同时判别模型的性能提升会迅速达到饱和。生成对抗网络(GANs)通过判别模型与生成模型之间的对抗训练实现联合优化,但标准GANs的训练过程存在众所周知的不稳定性,且常难以收敛。为解决这些问题,本文提出一种$\textit{自洽学习}$框架,其中判别器与生成器以闭环形式协同训练。在交替训练的多轮迭代中,两者相互增强,直至达成评分共识。该框架易于训练,且不会出现模式坍缩、不收敛等不稳定性问题。在句子语义匹配任务上的大量实验验证了该框架的有效性:零样本设置下判别器性能提升超过10个AP点,全数据设置下达到新的最优结果。