Visual representation based on covariance matrix has demonstrates its efficacy for image classification by characterising the pairwise correlation of different channels in convolutional feature maps. However, pairwise correlation will become misleading once there is another channel correlating with both channels of interest, resulting in the ``confounding'' effect. For this case, ``partial correlation'' which removes the confounding effect shall be estimated instead. Nevertheless, reliably estimating partial correlation requires to solve a symmetric positive definite matrix optimisation, known as sparse inverse covariance estimation (SICE). How to incorporate this process into CNN remains an open issue. In this work, we formulate SICE as a novel structured layer of CNN. To ensure end-to-end trainability, we develop an iterative method to solve the above matrix optimisation during forward and backward propagation steps. Our work obtains a partial correlation based deep visual representation and mitigates the small sample problem often encountered by covariance matrix estimation in CNN. Computationally, our model can be effectively trained with GPU and works well with a large number of channels of advanced CNNs. Experiments show the efficacy and superior classification performance of our deep visual representation compared to covariance matrix based counterparts.
翻译:基于协方差矩阵的视觉表示通过刻画卷积特征图中不同通道的成对相关性,已在图像分类中显示出其有效性。然而,当存在另一个与所关注的两个通道均相关的通道时,成对相关性可能产生误导,导致“混杂”效应。针对这种情况,应估计去除混杂效应的“偏相关”。但可靠地估计偏相关需要求解对称正定矩阵优化问题,即稀疏逆协方差估计(SICE)。如何将该过程融入卷积神经网络(CNN)仍是一个开放问题。本文中,我们将SICE形式化为CNN的一种新型结构化层。为确保端到端可训练性,我们开发了一种迭代方法,在前向和反向传播过程中求解上述矩阵优化。我们的工作获得了基于偏相关的深度视觉表示,并缓解了CNN中协方差矩阵估计常面临的小样本问题。在计算上,我们的模型可有效利用GPU进行训练,并能适应先进CNN中的大量通道。实验表明,与基于协方差矩阵的方法相比,我们的深度视觉表示具有更优的分类性能及有效性。