Graph representation learning has become a hot research topic due to its powerful nonlinear fitting capability in extracting representative node embeddings. However, for sequential data such as speech signals, most traditional methods merely focus on the static graph created within a sequence, and largely overlook the intrinsic evolving patterns of these data. This may reduce the efficiency of graph representation learning for sequential data. For this reason, we propose an adaptive graph representation learning method based on dynamically evolved graphs, which are consecutively constructed on a series of subsequences segmented by a sliding window. In doing this, it is better to capture local and global context information within a long sequence. Moreover, we introduce a weighted approach to update the node representation rather than the conventional average one, where the weights are calculated by a novel matrix computation based on the degree of neighboring nodes. Finally, we construct a learnable graph convolutional layer that combines the graph structure loss and classification loss to optimize the graph structure. To verify the effectiveness of the proposed method, we conducted experiments for speech emotion recognition on the IEMOCAP and RAVDESS datasets. Experimental results show that the proposed method outperforms the latest (non-)graph-based models.
翻译:图表示学习因其在提取代表性节点嵌入方面强大的非线性拟合能力而成为研究热点。然而,对于语音信号等序列数据,大多数传统方法仅关注序列内构建的静态图,很大程度上忽略了这些数据的内在演化模式。这会降低序列数据图表示学习的效率。为此,我们提出一种基于动态演化图的自适应图表示学习方法,该方法通过在滑动窗口分割的一系列子序列上连续构建图,从而更好地捕捉长序列中的局部和全局上下文信息。此外,我们引入了一种加权方法来更新节点表示,而非传统的平均方法,其中权重通过基于相邻节点度数的新型矩阵计算得到。最后,我们构建了一个可学习的图卷积层,该层结合图结构损失和分类损失来优化图结构。为验证所提方法的有效性,我们在IEMOCAP和RAVDESS数据集上进行了语音情感识别实验。实验结果表明,所提方法优于最新的(非)图模型。