Humans read texts at a varying pace, while machine learning models treat each token in the same way in terms of a computational process. Therefore, we ask, does it help to make models act more like humans? In this paper, we convert this intuition into a set of novel models with fixation-guided parallel RNNs or layers and conduct various experiments on language modeling and sentiment analysis tasks to test their effectiveness, thus providing empirical validation for this intuition. Our proposed models achieve good performance on the language modeling task, considerably surpassing the baseline model. In addition, we find that, interestingly, the fixation duration predicted by neural networks bears some resemblance to humans' fixation. Without any explicit guidance, the model makes similar choices to humans. We also investigate the reasons for the differences between them, which explain why "model fixations" are often more suitable than human fixations, when used to guide language models.
翻译:人类阅读文本的速度是变化的,而机器学习模型在计算过程中以相同方式处理每个词元。因此,我们提出疑问:让模型更接近人类行为是否有助于提升性能?在本文中,我们将这一直觉转化为一组新颖的模型,这些模型采用注视引导的并行RNN或网络层,并在语言建模和情感分析任务上开展多种实验以测试其有效性,从而为这一直觉提供实证验证。我们提出的模型在语言建模任务上表现优异,显著超越了基线模型。此外,有趣的是,我们发现神经网络预测的注视时长与人类注视存在一定相似性。在没有任何显式引导的情况下,模型做出了与人类相似的选择。我们还对二者存在差异的原因进行了探究,这解释了为何在用于引导语言模型时,“模型注视”通常比人类注视更为合适。