Traditional approaches for learning on categorical data underexploit the dependencies between columns (\aka fields) in a dataset because they rely on the embedding of data points driven alone by the classification/regression loss. In contrast, we propose a novel method for learning on categorical data with the goal of exploiting dependencies between fields. Instead of modelling statistics of features globally (i.e., by the covariance matrix of features), we learn a global field dependency matrix that captures dependencies between fields and then we refine the global field dependency matrix at the instance-wise level with different weights (so-called local dependency modelling) w.r.t. each field to improve the modelling of the field dependencies. Our algorithm exploits the meta-learning paradigm, i.e., the dependency matrices are refined in the inner loop of the meta-learning algorithm without the use of labels, whereas the outer loop intertwines the updates of the embedding matrix (the matrix performing projection) and global dependency matrix in a supervised fashion (with the use of labels). Our method is simple yet it outperforms several state-of-the-art methods on six popular dataset benchmarks. Detailed ablation studies provide additional insights into our method.
翻译:传统的类别数据学习方法未能充分利用数据集中列(即字段)之间的依赖性,因为这依赖于仅由分类/回归损失驱动的数据点嵌入。与此相反,我们提出了一种新颖的类别数据学习方法,旨在利用字段之间的依赖性。我们不全局建模特征的统计量(即通过特征的协方差矩阵),而是学习一个全局字段依赖矩阵来捕获字段间的依赖关系,然后在实例级别上针对每个字段使用不同权重(即所谓局部依赖建模)对该全局字段依赖矩阵进行细化,以改进字段依赖关系的建模。我们的算法利用了元学习范式:依赖矩阵在无标签情况下于元学习算法的内循环中细化,而外循环则以监督方式(使用标签)交织更新嵌入矩阵(执行投影的矩阵)和全局依赖矩阵。我们的方法简洁而有效,在六个常用数据集基准上优于多种最先进方法。详细的消融研究进一步提供了对我们方法的深刻见解。