Auto-regressive neural sequence models have been shown to be effective across text generation tasks. However, their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer (Stern et al., 2019) is an attractive alternative that allows outputting multiple tokens in a single generation step. Nevertheless, due to the incompatibility between absolute positional encoding and insertion-based generation schemes, it needs to refresh the encoding of every token in the generated partial hypothesis at each step, which could be costly. We design a novel reusable positional encoding scheme for Insertion Transformers called Fractional Positional Encoding (FPE), which allows reusing representations calculated in previous steps. Empirical studies on various text generation tasks demonstrate the effectiveness of FPE, which leads to floating-point operation reduction and latency improvements on batched decoding.
翻译:自回归神经序列模型已在文本生成任务中展现出有效性。然而,其从左到右的解码顺序阻碍了生成过程的并行化。插入式Transformer(Stern等人,2019)是一种引人注目的替代方案,允许在单步生成中输出多个令牌。然而,由于绝对位置编码与基于插入的生成方案之间存在不兼容性,该模型需要在每一步为生成的局部假设中每个令牌刷新编码,这一过程可能代价高昂。我们为插入式Transformer设计了一种新颖的可重用位置编码方案——分数位置编码(FPE),它允许重用先前步骤中计算得到的表示。针对多种文本生成任务的实证研究表明,FPE的有效性能够减少浮点运算次数,并提升批量解码的延迟性能。