Category text generation receives considerable attentions since it is beneficial for various natural language processing tasks. Recently, the generative adversarial network (GAN) has attained promising performance in text generation, attributed to its adversarial training process. However, there are several issues in text GANs, including discreteness, training instability, mode collapse, lack of diversity and controllability etc. To address these issues, this paper proposes a novel GAN framework, the feature-aware conditional GAN (FA-GAN), for controllable category text generation. In FA-GAN, the generator has a sequence-to-sequence structure for improving sentence diversity, which consists of three encoders including a special feature-aware encoder and a category-aware encoder, and one relational-memory-core-based decoder with the Gumbel SoftMax activation function. The discriminator has an additional category classification head. To generate sentences with specified categories, the multi-class classification loss is supplemented in the adversarial training. Comprehensive experiments have been conducted, and the results show that FA-GAN consistently outperforms 10 state-of-the-art text generation approaches on 6 text classification datasets. The case study demonstrates that the synthetic sentences generated by FA-GAN can match the required categories and are aware of the features of conditioned sentences, with good readability, fluency, and text authenticity.
翻译:类别文本生成因其对多种自然语言处理任务的促进作用而受到广泛关注。近年来,生成对抗网络(GAN)凭借其对抗训练过程在文本生成领域取得了令人瞩目的表现。然而,文本GAN仍存在离散性、训练不稳定、模式崩溃、缺乏多样性与可控性等问题。为了解决这些问题,本文提出了一种新颖的GAN框架——特征感知的条件GAN(FA-GAN),用于可控类别文本生成。在FA-GAN中,生成器采用序列到序列结构以提升句子多样性,该结构包含三个编码器(包括一个专用特征感知编码器和一个类别感知编码器)以及一个基于关系记忆核心并采用Gumbel SoftMax激活函数的解码器。判别器则额外增加了类别分类头。为生成指定类别的句子,在对抗训练中补充了多分类损失。我们进行了全面的实验,结果表明FA-GAN在6个文本分类数据集上持续优于10种当前最先进的文本生成方法。案例研究显示,FA-GAN生成的合成句子能够契合所需类别,并感知条件句的特征,具备良好的可读性、流畅性和文本真实性。