While humans process language incrementally, the best language encoders currently used in NLP do not. Both bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full, to be processed either forwards and backwards (BiLSTMs) or as a whole (Transformers). We investigate how they behave under incremental interfaces, when partial output must be provided based on partial input seen up to a certain time step, which may happen in interactive systems. We test five models on various NLU datasets and compare their performance using three incremental evaluation metrics. The results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality. The "omni-directional" BERT model, which achieves better non-incremental performance, is impacted more by the incremental access. This can be alleviated by adapting the training regime (truncated training), or the testing procedure, by delaying the output until some right context is available or by incorporating hypothetical right contexts generated by a language model like GPT-2.
翻译:尽管人类以增量方式处理语言,但当前自然语言处理中使用的最佳语言编码器并非如此。双向LSTM和Transformer都假设待编码的序列可完整获取,以便进行前向和反向处理(BiLSTM)或整体处理(Transformer)。我们研究了它们在增量接口下的行为,即当需要基于截至某一时间步长所获取的部分输入提供部分输出时(这在交互式系统中可能发生)。我们在多个自然语言理解数据集上测试了五种模型,并使用三种增量评估指标比较其性能。结果支持在增量模式下使用双向编码器的可能性,同时能保留其大部分非增量质量。在非增量性能上表现更优的“全向”BERT模型受增量访问的影响更大。这可以通过调整训练机制(截断训练)或测试流程来缓解,例如延迟输出直到获得部分右上下文,或引入由GPT-2等语言模型生成的假设性右上下文。