We study kernel methods in machine learning from the perspective of feature subspace. We establish a one-to-one correspondence between feature subspaces and kernels and propose an information-theoretic measure for kernels. In particular, we construct a kernel from Hirschfeld--Gebelein--R\'{e}nyi maximal correlation functions, coined the maximal correlation kernel, and demonstrate its information-theoretic optimality. We use the support vector machine (SVM) as an example to illustrate a connection between kernel methods and feature extraction approaches. We show that the kernel SVM on maximal correlation kernel achieves minimum prediction error. Finally, we interpret the Fisher kernel as a special maximal correlation kernel and establish its optimality.
翻译:我们从特征子空间的角度研究机器学习中的核方法。建立了特征子空间与核之间的一一对应关系,并提出了一种基于信息论的核度量准则。特别地,我们利用Hirschfeld-Gebelein-Rényi最大相关函数构造了核(称为最大相关核),并证明了其信息论最优性。以支持向量机为例,阐释了核方法与特征提取方法之间的关联。我们证明,采用最大相关核的核支持向量机可实现最小预测误差。最后,将Fisher核解释为一种特殊形式的极大相关核,并建立了其最优性。