We aim at capturing high-order statistics of feature vectors formed by a neural network, and propose end-to-end second- and higher-order pooling to form a tensor descriptor. Tensor descriptors require a robust similarity measure due to low numbers of aggregated vectors and the burstiness phenomenon, when a given feature appears more/less frequently than statistically expected. The Heat Diffusion Process (HDP) on a graph Laplacian is closely related to the Eigenvalue Power Normalization (EPN) of the covariance/auto-correlation matrix, whose inverse forms a loopy graph Laplacian. We show that the HDP and the EPN play the same role, i.e., to boost or dampen the magnitude of the eigenspectrum thus preventing the burstiness. We equip higher-order tensors with EPN which acts as a spectral detector of higher-order occurrences to prevent burstiness. We also prove that for a tensor of order r built from d dimensional feature descriptors, such a detector gives the likelihood if at least one higher-order occurrence is 'projected' into one of binom(d,r) subspaces represented by the tensor; thus forming a tensor power normalization metric endowed with binom(d,r) such 'detectors'. For experimental contributions, we apply several second- and higher-order pooling variants to action recognition, provide previously not presented comparisons of such pooling variants, and show state-of-the-art results on HMDB-51, YUP++ and MPII Cooking Activities.
翻译:我们旨在捕捉神经网络特征向量的高阶统计特性,并提出端到端的二阶及更高阶池化方法以构建张量描述子。由于聚合向量数量较少以及特征出现频率偏离统计预期的爆发性现象,张量描述子需要稳健的相似性度量。图拉普拉斯上的热扩散过程(HDP)与协方差/自相关矩阵的特征值幂归一化(EPN)密切相关,而该矩阵的逆矩阵构成环状图拉普拉斯。我们证明HDP与EPN具有相同作用,即通过增强或抑制特征谱的幅度来抑制爆发性现象。我们将EPN引入高阶张量,使其作为检测高阶特征出现频率的谱探测器以防止爆发性现象。同时证明:对于由d维特征描述子构建的r阶张量,若至少一个高阶特征出现被"投影"到张量所表征的binom(d,r)个子空间中,该探测器可给出其似然概率,从而构成包含binom(d,r)个此类"探测器"的张量幂归一化度量。在实验层面,我们将多种二阶及更高阶池化变体应用于动作识别任务,提供了此前未有的池化变体间对比,并在HMDB-51、YUP++和MPII Cooking Activities数据集上取得最优结果。