Visual representation based on covariance matrix has demonstrates its efficacy for image classification by characterising the pairwise correlation of different channels in convolutional feature maps. However, pairwise correlation will become misleading once there is another channel correlating with both channels of interest, resulting in the ``confounding'' effect. For this case, ``partial correlation'' which removes the confounding effect shall be estimated instead. Nevertheless, reliably estimating partial correlation requires to solve a symmetric positive definite matrix optimisation, known as sparse inverse covariance estimation (SICE). How to incorporate this process into CNN remains an open issue. In this work, we formulate SICE as a novel structured layer of CNN. To ensure end-to-end trainability, we develop an iterative method to solve the above matrix optimisation during forward and backward propagation steps. Our work obtains a partial correlation based deep visual representation and mitigates the small sample problem often encountered by covariance matrix estimation in CNN. Computationally, our model can be effectively trained with GPU and works well with a large number of channels of advanced CNNs. Experiments show the efficacy and superior classification performance of our deep visual representation compared to covariance matrix based counterparts.
翻译:基于协方差矩阵的视觉表示通过刻画卷积特征图中不同通道间的成对相关性,已证明了其在图像分类中的有效性。然而,一旦存在与两个目标通道均相关的第三个通道时,成对相关性会产生误导,导致"混淆"效应。针对这种情况,应估计能够消除混淆效应的"偏相关"。但可靠估计偏相关需要求解对称正定矩阵优化问题,即稀疏逆协方差估计(SICE)。如何将该过程融入卷积神经网络(CNN)仍是待解决的问题。本文创新性地将SICE构建为CNN的新型结构化层。为确保端到端可训练性,我们开发了一种迭代方法,在前向和反向传播过程中解决上述矩阵优化问题。本研究获得了基于偏相关的深度视觉表示,并有效缓解了CNN中协方差矩阵估计常面临的小样本问题。在计算方面,本模型可在GPU上高效训练,并能适用于先进CNN的大通道数场景。实验表明,与基于协方差矩阵的同类方法相比,我们提出的深度视觉表示具有更优的分类性能。