Fitting generative models to sequential data typically involves two recursive computations through time, one forward and one backward. The latter could be a computation of the loss gradient (as in backpropagation through time), or an inference algorithm (as in the RTS/Kalman smoother). The backward pass in particular is computationally expensive (since it is inherently serial and cannot exploit GPUs), and difficult to map onto biological processes. Work-arounds have been proposed; here we explore a very different one: requiring the generative model to learn the joint distribution over current and previous states, rather than merely the transition probabilities. We show on toy datasets that different architectures employing this principle can learn aspects of the data typically requiring the backward pass.
翻译:将生成模型拟合至序列数据通常需要两次递归计算:一次前向传播和一次反向传播。后者可能是损失梯度的计算(如时间反向传播),或是推理算法(如RTS/卡尔曼平滑器)。反向传递尤其计算成本高昂(因其固有串行性且无法利用GPU),且难以映射到生物过程。已有研究者提出变通方案;本研究探索一种截然不同的思路:要求生成模型学习当前状态与前序状态的联合分布,而不仅是转移概率。我们在玩具数据集上证明,采用该原理的不同架构能够学习通常需要反向传递才能获取的数据特征。