This paper presents a novel unifying framework of bilinear LSTMs that can represent and utilize the nonlinear interaction of the input features present in sequence datasets for achieving superior performance over a linear LSTM and yet not incur more parameters to be learned. To realize this, our unifying framework allows the expressivity of the linear vs. bilinear terms to be balanced by correspondingly trading off between the hidden state vector size vs. approximation quality of the weight matrix in the bilinear term so as to optimize the performance of our bilinear LSTM, while not incurring more parameters to be learned. We empirically evaluate the performance of our bilinear LSTM in several language-based sequence learning tasks to demonstrate its general applicability.
翻译:本文提出了一种新颖的统一双线性LSTM框架,能够表示并利用序列数据中输入特征的非线性交互,从而在性能上超越线性LSTM,且无需学习更多参数。为实现这一目标,我们的统一框架允许通过权衡隐状态向量大小与双线性项中权重矩阵的近似质量,来调节线性项与双线性项的表达能力,从而在无需引入额外参数的情况下优化双线性LSTM的性能。我们在多个基于语言的序列学习任务上对提出的双线性LSTM进行了实证评估,以验证其广泛适用性。