Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole, abstracting away the notion of time. Recent work attempts to apply Transformers incrementally via restart-incrementality by repeatedly feeding, to an unchanged model, increasingly longer input prefixes to produce partial outputs. However, this approach is computationally costly and does not scale efficiently for long sequences. In parallel, we witness efforts to make Transformers more efficient, e.g. the Linear Transformer (LT) with a recurrence mechanism. In this work, we examine the feasibility of LT for incremental NLU in English. Our results show that the recurrent LT model has better incremental performance and faster inference speed compared to the standard Transformer and LT with restart-incrementality, at the cost of part of the non-incremental (full sequence) quality. We show that the performance drop can be mitigated by training the model to wait for right context before committing to an output and that training with input prefixes is beneficial for delivering correct partial outputs.
翻译:增量式处理使交互式系统能够基于部分输入进行响应,这对于对话代理等场景而言是一种理想特性。当前流行的Transformer架构本质上是将序列作为一个整体进行处理,从而抽象化了时间概念。近期研究尝试通过重启增量法(即对未修改的模型反复输入递增的前缀序列以生成部分输出)实现Transformer的增量式应用。然而,该方法计算成本高昂,且难以高效扩展到长序列场景。与此同时,我们注意到业界正致力于提升Transformer的效率,例如采用具有循环机制的线性Transformer(LT)。本研究考察了LT在英语增量式自然语言理解中的可行性。结果表明,与标准Transformer及采用重启增量法的LT相比,循环LT模型虽然牺牲了部分非增量式(完整序列)质量,但具有更优的增量性能和更快的推理速度。研究表明,通过训练模型在输出前等待右侧上下文信息可缓解性能下降问题,并且采用输入前缀训练有助于生成正确的部分输出。