In NLP, incremental processors produce output in instalments, based on incoming prefixes of the linguistic input. Some tokens trigger revisions, causing edits to the output hypothesis, but little is known about why models revise when they revise. A policy that detects the time steps where revisions should happen can improve efficiency. Still, retrieving a suitable signal to train a revision policy is an open problem, since it is not naturally available in datasets. In this work, we investigate the appropriateness of regressions and skips in human reading eye-tracking data as signals to inform revision policies in incremental sequence labelling. Using generalised mixed-effects models, we find that the probability of regressions and skips by humans can potentially serve as useful predictors for revisions in BiLSTMs and Transformer models, with consistent results for various languages.
翻译:在自然语言处理中,增量处理器基于输入语言前缀逐步生成输出。某些词元会触发修订,导致输出假设的编辑,但模型何时以及为何进行修订仍鲜为人知。能够检测修订触发时序的策略可提升处理效率。然而,训练修订策略所需的有效信号检索仍是一个开放问题——此类信号在现有数据集中并非天然存在。本研究探究了人类阅读眼动数据中的回视与跳读作为增量序列标注中修订策略信号源的适用性。通过广义混合效应模型分析,我们发现人类回视与跳读的概率可作为BiLSTM和Transformer模型中修订行为的有效预测指标,且该发现对多种语言具有一致性。