Many of diverse phenomena in nature often inherently encode both short and long term temporal dependencies, short term dependencies especially resulting from the direction of flow of time. In this respect, we discovered experimental evidences suggesting that {\it interrelations} of these events are higher for closer time stamps. However, to be able for attention based models to learn these regularities in short term dependencies, it requires large amounts of data which are often infeasible. This is due to the reason that, while they are good at learning piece wised temporal dependencies, attention based models lack structures that encode biases in time series. As a resolution, we propose a simple and efficient method that enables attention layers to better encode short term temporal bias of these data sets by applying learnable, adaptive kernels directly to the attention matrices. For the experiments, we chose various prediction tasks using Electronic Health Records (EHR) data sets since they are great examples that have underlying long and short term temporal dependencies. The results of our experiments show exceptional classification results compared to best performing models on most of the task and data sets.
翻译:自然界中众多现象往往天然编码了短期与长期的时序依赖关系,其中短期依赖尤其源于时间流动的方向性。我们发现的实验证据表明,时间戳越接近的事件之间具有更强的相互关联性。然而,注意力模型需要大量数据才能学习短期依赖中的这些规律性,这在实践中往往难以实现。这是因为注意力模型虽擅长学习分段时序依赖,却缺少编码时序偏置的结构。为此,我们提出一种简洁高效的方法,通过直接在注意力矩阵上应用可学习的自适应核函数,使注意力层能更好地编码数据集的短期时序偏置。实验选用电子健康记录(EHR)数据集进行多种预测任务——这些数据集恰是蕴含长短时序依赖的典型范例。实验结果表明,在大多数任务与数据集上,我们的方法相较于最优模型取得了卓越的分类性能。