Most existing text generation models follow the sequence-to-sequence paradigm. Generative Grammar suggests that humans generate natural language texts by learning language grammar. We propose a syntax-guided generation schema, which generates the sequence guided by a constituency parse tree in a top-down direction. The decoding process can be decomposed into two parts: (1) predicting the infilling texts for each constituent in the lexicalized syntax context given the source sentence; (2) mapping and expanding each constituent to construct the next-level syntax context. Accordingly, we propose a structural beam search method to find possible syntax structures hierarchically. Experiments on paraphrase generation and machine translation show that the proposed method outperforms autoregressive baselines, while also demonstrating effectiveness in terms of interpretability, controllability, and diversity.
翻译:大多数现有的文本生成模型遵循序列到序列的范式。生成语法理论表明,人类通过习得语言语法来生成自然语言文本。我们提出了一种句法引导的生成机制,该机制基于成分句法树沿自上而下的方向生成序列。解码过程可分解为两个部分:(1)在给定源句的情况下,预测词汇化句法上下文中每个成分的填充文本;(2)映射并扩展每个成分以构建下一级句法上下文。据此,我们提出了一种结构化束搜索方法来分层寻找可能的句法结构。在释义生成和机器翻译上的实验表明,所提方法优于自回归基线,同时在可解释性、可控性和多样性方面也展现出有效性。